WO2023019636A1 - 一种基于深度学习网络的瑕疵点识别方法 - Google Patents

一种基于深度学习网络的瑕疵点识别方法 Download PDF

Info

Publication number
WO2023019636A1
WO2023019636A1 PCT/CN2021/115494 CN2021115494W WO2023019636A1 WO 2023019636 A1 WO2023019636 A1 WO 2023019636A1 CN 2021115494 W CN2021115494 W CN 2021115494W WO 2023019636 A1 WO2023019636 A1 WO 2023019636A1
Authority
WO
WIPO (PCT)
Prior art keywords
branch
transformer
encoder
attention
decoder
Prior art date
Application number
PCT/CN2021/115494
Other languages
English (en)
French (fr)
Inventor
王慧燕
姜欢
Original Assignee
浙江工商大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江工商大学 filed Critical 浙江工商大学
Priority to US17/810,608 priority Critical patent/US11615523B2/en
Publication of WO2023019636A1 publication Critical patent/WO2023019636A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the invention belongs to the field of image processing and target detection in computer vision, and relates to a recognition method for detecting defects based on a deep learning network.
  • Traditional target detection is a technology that generates a suggestion box, then extracts the features in the target box, and finally classifies it.
  • Traditional object detection algorithms have bottlenecks in both speed and accuracy. With the rapid development of deep learning neural network algorithms, object detection tasks in videos and images have also developed rapidly.
  • Defect detection is a very important job in the industry. At present, workers mainly rely on manual defect selection, which is very time-consuming and has high labor costs. At the same time, workers are required to have rich experience.
  • Existing deep learning-based target detection can be divided into the following two categories: one-stage and two-stage target detection. Of the algorithms in the above two directions, the former has a faster speed, while the latter can have a better accuracy. However, the algorithms in the above two directions cannot achieve an ideal effect in the detection of small objects.
  • the present invention improves the existing algorithm and proposes a method suitable for defective points. The detection method improves the detection rate of centimeter-level blemishes and small targets, and at the same time increases the detection speed.
  • the present invention provides a defect point identification method based on a deep learning network.
  • the present invention comprises the following steps:
  • Step 1 Take a video image sequence containing defect points, and input it to the feature extraction branch for feature extraction.
  • Step 2 Flatten the feature map output by the feature extraction branch, add position encoding information through the adding position information branch, and input it to the encoder transformer-encoder in the transformer branch.
  • the self-attention layer Self-Attention in the encoder transformer-encoder has been improved, specifically: a position-based Gaussian distribution Attention branch for enhancing locality is added to the self-attention layer Self-Attention.
  • Step 4 Output the last layer of the encoder transformer-encoder as the input of the decoder transformer-decoder in the transformer branch, and input the learnable matrix object queries into the decoder transformer-decoder.
  • Step 5 Input the result of the transformer decoder to the prediction branch of the feedforward neural network to obtain the classification branch and the regression branch respectively, wherein the classification branch performs classification through the full connection, and the regression branch performs the regression of the bounding box through the multi-layer perceptron.
  • Step 6 Train the network.
  • the classification loss is cross-entropy loss, and the regression loss includes L iou loss and L1 loss;
  • the network is composed of a feature extraction branch, a location information adding branch, a transformer branch and a feedforward neural network prediction branch.
  • Step 7 When using the trained network for forward reasoning, the calculation model predicts the defect points in the image.
  • the present invention calculates the Self-Attention (self-attention) layer in the encoder (encoder), a Gaussian distribution-based Gaussian distribution with enhanced locality is calculated in parallel, and then superimposed on the Attention itself, so that the Self-Attention
  • the (self-attention) layer increases locality, which can better learn the characteristics of centimeter-level defects, and is more suitable for the detection of small defects. In the case of only increasing the addition of one matrix without affecting the speed, the detection accuracy of small flawed targets is improved.
  • the present invention uses a 3*3 convolution kernel with a step size of 2 to reduce the parameters in the K and V matrices; After the transformer branch, it is output to the prediction branch of the feedforward neural network, and the classification and the regression of the prediction frame are performed at the same time.
  • the present invention can realize high-precision defect detection without other auxiliary equipment.
  • Figure 1 is an overall network structure diagram
  • Figure 2 is a transformer branch structure diagram
  • Figure 3 is a feature map conversion matrix and a Self-Attention (self-attention) layer structure diagram
  • Fig. 4 is the flow chart that the inventive method trains
  • Fig. 5 is a flowchart of forward reasoning performed by the method of the present invention.
  • the invention proposes a defect recognition method based on a deep learning network.
  • Its network framework structure is shown in Figure 1. It mainly includes four branches, namely feature extraction branch (Resnet-50), adding positional information branch (Positional encoding), transformer branch (transformer encoder and decoder), and feedforward neural network prediction. branch (FNN).
  • the network training process is shown in Figure 4.
  • the brief steps are as follows: 1. Obtain the feature map of the defect through the feature extraction network; 2. Flatten the extracted feature map and add position encoding; 3. Improve the encoder encoder through the transformer branch; 4. Decoder through the transformer branch decoder; 5. Input to the feedforward neural network prediction branch (FNN) for regression and classification; 6. Calculate the classification cross entropy loss L Class , L1 loss L 1 and intersection and union ratio loss Liou , and finally calculate the Hungarian loss function; 7. Backpropagation updates network parameters.
  • FNN feedforward neural network prediction branch
  • the forward reasoning process of the network is shown in Figure 5.
  • the brief steps are as follows: 1. Obtain the feature map of the defect through the feature extraction network; 2. Flatten the extracted feature map and add position encoding; 3. The encoder encoder improved through the transformer branch; 4. The decoder decoder through the transformer branch ; 5. Input to the feedforward neural network prediction branch (FNN) for regression and classification; 6. When using the trained network for forward reasoning, predict the location of the defect and the category of the defect.
  • FNN feedforward neural network prediction branch
  • a method for identifying defects based on a deep learning network comprising the steps of:
  • Step 1 Take a video image sequence containing defect points, and input it into the Resnet-50 network for feature extraction, specifically:
  • Step 2 Flatten the output feature map, add position encoding information, and put it into the transformer-encoder (encoder), specifically:
  • the flattening operation is as follows: change the shape of the feature map from 7*7*256 to 49*256, that is, change H*W*C into (H*W)*C, compress the height and width into the same dimension, and pass
  • the flattened feature map is denoted as X;
  • the position encoding operation is as follows: divide the 256 dimensions in the feature map 49*256 into the x direction and the y direction, where the first 128 dimensions are in the x direction, and the last 128 dimensions are in the y direction.
  • the calculated matrix is added to the Query and Key matrices in steps 3 and 4.
  • x represents the position of the current pixel in the image row or column
  • 2k and 2k+1 represent whether the current position is even or odd
  • d represents the dimension of the vector
  • Step 3 Add the Attention branch of the Gaussian distribution to the Self-Attention (self-attention) layer in the transformer-encoder (encoder), and perform convolution and dimensionality reduction on the Key and Value matrices.
  • the transformer-encoder (encoder) structure is from bottom to top: Self-Attention (self-attention) layer, Add&Norm layer, FNN layer, Add&Norm layer.
  • Self-Attention self-attention
  • Add&Norm add&Norm layer
  • FNN FNN layer
  • Add&Norm layer FNN layer
  • Add&Norm layer FNN layer
  • this step is specifically shown on the left side of Figure 3, and then after the Self-Attention (self-attention) layer, the Self-Attention (self-attention ) The specific structure of the layer is shown on the right side of Figure 3.
  • the second step is through the Add&Norm layer, where the Add operation is to directly add the result calculated by the Self-Attention (self-attention) layer to the original feature map X, similar to the residual structure, the Norm operation, and the result after the Add operation. Normalized processing.
  • the third step is to pass through the FNN layer. Specifically, first pass a fully connected layer to change the number of channels from 256 to 2048, then pass the Relu activation function, set the dropout to 0.1, and finally pass a fully connected layer to change the number of channels from 2048 becomes 256, dropout is set to 0.1.
  • the fourth step is to go through the Add&Norm layer, which is similar to the second step.
  • the K and V matrices are improved before the Self-Attention (self-attention) layer in the encoder, as shown on the left side of FIG. 3 .
  • the corresponding three matrices of Q, K, and V are all 49*256 in size; for Q, K matrices Add the position code obtained in step 2, expand K and V into 7*7*256 and then perform convolution.
  • a 3*3 convolution kernel is used, and the convolution step is 2.
  • the number of convolution kernels is the same as the input channel Consistent, and then flatten the resulting 3*3*256 convolution.
  • a branch of Attention with Gaussian distribution is added to the Self-Attention (self-attention) layer in the encoder (encoder), as shown on the right side of Figure 3; the reason for the increase is that a piece of defect often appears at the current pixel point and The surrounding pixels, although the feature extraction of Resnet-50 can increase the locality around the current pixel, but as the network depth gets deeper, it is not sensitive to the perception of locality, which is not conducive to the detection of small blemish pairs.
  • encoders six encoders (decoders) are set, and the above two steps are performed on the Self-Attention of each encoder (decoder), and the final result is input into the decoder.
  • Step 4 output Q from the last layer of the encoder (encoder) of the transformer and input it into the decoder (decoder), and input the object queries (learnable matrix) into the decoder.
  • object queries is a matrix with a random initialization size of 100*256, of which 100 is the number of preset targets.
  • it will learn the feature code input by the encoder (encoder), and then the decoder (decoder) will These 100 queries are converted into 100 target outputs, that is, the relationship between the learning target and the content of the entire graph, and finally output to FNN (feed-forward neural network prediction branch) for classification and prediction.
  • FNN feed-forward neural network prediction branch
  • transformer-decoder decoder
  • its structure is from bottom to top: Self-Attention (self-attention) layer, Add&Norm layer, Self-Attention layer, Add&Norm layer, FNN layer, Add&Norm layer.
  • Self-Attention self-attention
  • the first step is to transform the object queries (learnable matrix) into Q, K, and V matrices, and then add the Q, K matrices to the object queries (learnable matrix) to learn the relationship between the target and the whole world, and then convert Q, K, and K, V matrix input Self-Attention (self-attention) layer;
  • the fourth step is to pass the Add&Norm layer
  • the fifth step through the FNN layer, where the settings of the FNN layer are consistent with the encoder (encoder);
  • Step 5 The result of the transformer decoder (decoder) is input to two branches.
  • the classification branch performs classification through full connection
  • the regression branch performs bounding box regression through a multi-layer perceptron. Specifically:
  • the decoder (decoder) will eventually output 6*100*256, where 6 represents 6 decoders (decoders).
  • This embodiment only needs to take the result of the last decoder (decoder).
  • the results of 6 decoders (decoder) will be output during training.
  • For the branch of classification directly use full connection to change the number of channels from 256 to n, and output 100*n, where n represents the number of defect categories in the defect data set; use 3-layer MLP for the prediction frame, the number of input channels is 256, and the hidden layer is 256, the output layer is 4, and the output is 100*4.
  • the classification loss function is a cross-entropy loss function, and finally these two loss functions are input into the Hungarian loss function to calculate the loss.
  • the Hungarian loss function looks like this:
  • Step 7 When using the trained network for forward reasoning, predict the defect location and defect category.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

一种基于深度学习网络的瑕疵点识别方法。该方法首先将瑕疵的图片通过Resnet‑50特征提取网络,提取瑕疵的特征,然后通过改进的transformer网络对瑕疵进行检测,识别出瑕疵。该方法通过对DETR网络的transformer网络模块进行改进,能够提升速度的同时,将瑕疵能够更精确地检测出来。

Description

一种基于深度学习网络的瑕疵点识别方法 技术领域
本发明属于计算机视觉中的图像处理和目标检测领域,涉及一种基于深度学习网络来检测瑕疵的识别方法。
背景技术
传统的目标检测是通过生成建议框,接着提取目标框内的特征、最后进行分类的技术。传统的目标检测算法在速度和精度上都存在瓶颈,随着深度学习神经网络算法的快速发展,视频和图像中的目标检测任务也得到了快速发展。
瑕疵点检测是工业界一项很重要的工作,目前主要依靠工人手工进行瑕疵挑选,非常耗时,人力成本高,同时要求工人要有丰富的经验。现有基于深度学习的目标检测可以分成如下两类:one-stage和two-stage的目标检测。上述两种方向的算法,前者具有较快的速度,而后者能够有较好的精度。但上述两种方向上的算法,在对小物体的检测均不能达到一个理想的效果,本发明在深度学习one-stage模型的基础上,对已有算法进行改进,提出一种适用于瑕疵点的检测方法,提升对厘米级别的瑕疵小目标的检出率,同时提升检测速度。
发明内容
本发明针对现有技术的不足,提供了一种基于深度学习网络的瑕疵点识别方法。
本发明解决技术问题所采取的技术方案为:
本发明包括以下步骤:
步骤1、拍摄包含瑕疵点的视频图像序列,将其输入至特征提取分支进行特征提取。
步骤2、将特征提取分支输出的特征图经过展平,通过添加位置信息分支加入位置编码信息,输入至transformer分支中的编码器transformer-encoder。
所述的编码器transformer-encoder中的自注意力层Self-Attention进行了改进,具体是:在自注意力层Self-Attention中增加了一个用于增强局部性且基于位置的高斯分布Attention分支。
同时还对输入至编码器transformer-encoder的Key和Value矩阵进行卷积降维,用于提升自注意力层Self-Attention的计算速度。
步骤4、将编码器transformer-encoder的最后一层输出,作为transformer分支中解码器transformer-decoder的输入,并把可学习的矩阵object queries输入到解码器transformer-decoder中。
步骤5、将解码器transformer decoder的结果输入到前馈神经网络预测分支,分别得到分类分支和回归分支,其中分类分支通过全连接进行分类,回归分支通过多层感知机进行边界框的回归。
步骤6、对网络进行训练,训练时,分类损失为交叉熵损失,回归损失包括L iou损失和L1损失;
所述网络由特征提取分支、添加位置信息分支、transformer分支和前馈神经网络预测分支组成。
步骤7、使用训练好的网络进行前向推理时,计算模型预测图像中的瑕疵点。
本发明的有益效果:
本发明在encoder(编码器)中计算Self-Attention(自注意力)层时,并行地计算一个增强局部性的基于位置的高斯分布Attention,然后叠加到本身的Attention上,从而能够让Self-Attention(自注意力)层增加局部性,能够更好地学习到厘米级别的瑕疵的特征,更加适合对瑕疵小目标的检测。在只增加一个矩阵的加法,不影响速度的情况下,提高了对瑕疵小目标的检测的精确度。
考虑到影响transformer分支速度的主要原因是Attention公式计算复杂,为了减少计算复杂度,本发明采用3*3的卷积核,步长为2的卷积,减少K、V矩阵中的参数;通过transformer分支后,输出到前馈神经网络预测分支中,同时进行分类和预测框的回归,本发明无需其他辅助设备即可实现高精度的瑕疵检测。
附图说明
为了更清楚的展示本发明实施例中的网络结构以及训练和前向推理过程,下面将对实施例中所需要使用的附图做以简单地介绍。
图1为整体网络结构图;
图2为transformer分支结构图;
图3为特征图转化矩阵和Self-Attention(自注意力)层结构图;
图4为本发明方法进行训练的流程图;
图5为本发明方法进行前向推理的流程图。
具体实施方式
为了更为具体地描述本发明,下面结合附图及具体实施方式对本 发明的技术方案进行详细说明。
本发明提出了一种基于深度学习网络的瑕疵识别方法。其网络框架结构如图1所示,主要包含四个分支,分别为特征提取分支(Resnet-50),添加位置信息分支(Positional encoding),transformer分支(transformer encoder和decoder),前馈神经网络预测分支(FNN)。
网络训练过程如图4所示。其简要步骤如下:一、通过特征提取网络获得瑕疵的特征图;二、对提取的特征图展平后加入位置编码;三、通过transformer分支改进的编码器encoder;四、通过transformer分支的解码器decoder;五、输入到前馈神经网络预测分支(FNN)进行回归和分类;六、计算分类交叉熵损失L Class、L1损失L 1和交并比损失L iou,最后计算匈牙利损失函数;七、反向传播更新网络参数。
网络前向推理过程如图5所示。其简要步骤如下:一、通过特征提取网络获得瑕疵的特征图;二、对提取的特征图展平加入位置编码;三、通过transformer分支改进的编码器encoder;四、通过transformer分支的解码器decoder;五、输入到前馈神经网络预测分支(FNN)进行回归和分类;六、使用训练好的网络进行前向推理时,预测出瑕疵位置以及瑕疵的类别。
实施例:
一种基于深度学习网络的瑕疵识别方法,包括如下步骤:
步骤1、拍摄包含瑕疵点的视频图像序列,将其输入Resnet-50网络进行特征提取,具体的:
首先将图片输入到Resnet-50特征提取网络,然后得到特征图为 7*7*2048,再通过一个卷积核大小为1,步长为1,卷积个数为256的卷积,降低特征图的通道数,通过卷积后的特征图为7*7*256。
步骤2、将输出得到的特征图经过展平,加入位置编码信息,放入到transformer-encoder(编码器)中,具体的:
展平操作如下:将特征图由7*7*256的形状变成49*256,即把H*W*C变成(H*W)*C,将高和宽压缩成同一个维度,通过展平后的特征图,记作X;
位置编码操作如下:将特征图49*256中的256维度分为x方向和y反向,其中前128维是x方向的,后128维是y方向的。先对x方向进行位置编码,其中位置编码函数如下,把公式中的d取256,k为当前的位置带入,x为特征图中对应的值,如果是偶数则使用sinx函数,为奇数使用cosx函数;再对y方向加入位置编码。计算完之后的矩阵,在步骤3、4中加入到Query、Key矩阵中。
Figure PCTCN2021115494-appb-000001
Figure PCTCN2021115494-appb-000002
其中,x代表当前的像素点在图像行或列中的位置,2k和2k+1分别代表当前位置是偶数还是奇数,d代表向量的维度。
步骤3、在transformer-encoder(编码器)中的Self-Attention(自注意力)层中加入高斯分布的Attention分支,并对Key和Value矩阵进行卷积降维。
如图2所示,transformer-encoder(编码器)结构由下之上分别是:Self-Attention(自注意力)层、Add&Norm层、FNN层、Add&Norm层。 其具体流程如下:
第一步,将特征图X转化为Q、K、V矩阵后,这一步具体的如图3的左侧所示,再经过Self-Attention(自注意力)层,Self-Attention(自注意力)层具体结构如图3右侧所示。
第二步,通过Add&Norm层,其中Add操作就是把Self-Attention(自注意力)层计算的结果与原特征图X直接相加,类似于残差结构,Norm操作,把Add操作后的结果进行归一化处理。
第三步,通过FNN层,具体的,先通过一个全连接层,将通道数从256变成2048,再通过Relu激活函数,dropout设置为0.1,最后再通过一个全连接层,将通道数从2048变成256,dropout设置为0.1。
第四步,再通过Add&Norm层,具体的与第二步类似。
本实施例在编码器encoder中的Self-Attention(自注意力)层之前,对K、V矩阵进行改进,如图3左侧所示。具体的:根据输入的特征图X,分别通过三个线性层(W Q、W K、W V),得到对应的Q、K、V三个矩阵大小均为49*256;对Q、K矩阵加上步骤二得到的位置编码,将K、V展成7*7*256后进行卷积,这里采用3*3的卷积核,卷积步长为2,卷积核个数与输入通道一致,然后将得到的3*3*256的卷积展平,在先前的实验中发现,Q、K、V矩阵是低秩,也就说Q、K、V矩阵实际训练中的维度并没这么高,那么本实施例用卷积处理,并不会丢失很多的信息。
改进前Q、K、V矩阵的大小均为H*W*C,则计算复杂度为 O((H*W) 2*C),改进后的计算复杂度为O(H*W*h*w*C),其中H*W=49,而通过卷积后h*w=9,与改进前的Attention相比,计算复杂度下降了5倍左右,极大提升了Self-Attention(自注意力)层的计算速度。
本实施例在encoder(编码器)中的Self-Attention(自注意力)层添加一个与高斯分布的Attention的分支,如图3右侧;增加的原因在于:一块瑕疵往往出现在当前像素点及其周围的像素点,虽然通过Resnet-50的特征提取,能够增加当前像素点周围的局部性,但是随着网络深度越深,对局部性感知不敏感,这不利于对小瑕疵对检测。
本实施例计算本身的Attention的Q、K的Matmul(矩阵乘法)和softmax函数的同时,可以计算Q、K矩阵的高斯Attention,接着将两者的结果相加,通过Norm层对结果进行归一化,最后,与V矩阵相乘得到最终的Attention矩阵。
本实施例高斯分布的Attention的分支中采用标准正态分布的标准差σ 2=1/(2π),概率密度函数为
Figure PCTCN2021115494-appb-000003
其中s为两个q i(Q矩阵中的行向量)和k i(K矩阵中的行向量)个向量之间的距离。由于只做了一个矩阵的加法,不影响速度的情况下,提高对瑕疵的检测的精确度。
本实施例设置6个encoder(解码器),对每个encoder(解码器)的Self-Attention都执行如上两步操作,将最后得到的结果输入到decoder中。
步骤4、从transformer的encoder(编码器)的最后一层输出Q输 入到decoder(解码器)中,并把object queries(可学习的矩阵)输入到decoder中。具体的:
object queries是一个随机初始化大小为100*256的矩阵,其中100为预先设定的目标个数,在训练的过程,它会学习encoder(编码器)输入的特征编码,然后decoder(解码器)将这100个查询转换为100个目标输出,即学习目标和整张图的内容之间的关系,最后将输出到FNN(前馈神经网络预测分支),进行分类和预测。
其中,transformer-decoder(解码器)结构如图2的右半部分虚线框所示,其结构由下之上分别是:Self-Attention(自注意力)层、Add&Norm层、Self-Attention层、Add&Norm层、FNN层、Add&Norm层。其流程具体如下:
第一步,将object queries(可学习的矩阵)转化为Q、K、V矩阵后,Q、K矩阵加上object queries(可学习的矩阵)用于学习目标与全局的关系,然后将Q、K、V矩阵输入Self-Attention(自注意力)层;
第二步,通过Add&Norm层;
第三步,Self-Attention(自注意力)中的Q分别由上一步的输出结果和object queries(可学习的矩阵)相加,这里的K和V矩阵与encoder(编码器)类似,将encoder(编码器)的结果Attention矩阵通过线性层得到K、V,这里不对K、V降维。这里要注意的是K矩阵需要加上位置编码,V不需要;
第四步,通过Add&Norm层;
第五步,通过FNN层,这里的FNN层的设置与encoder(编码器) 一致;
第六步,通过Add&Norm层。
步骤5、transformer decoder(解码器)的结果输入到两个分支,分类分支通过全连接进行分类,回归分支通过多层感知机进行边界框的回归,具体的:
decoder(解码器)最终会输出6*100*256,其中6代表了6个decoder(解码器),本实施例只需要取最后一个decoder(解码器)的结果,在训练时,因为对其余5个decoder(解码器)加上同样的loss(损失函数)监督,对效果会有所提升,所以训练时会输出6个decoder(解码器)的结果。对分类这个分支直接采用全连接将通道数从256变成n,输出100*n,其中n代表瑕疵数据集的瑕疵类别数目;对预测框采用3层的MLP,输入通道数为256,隐藏层为256,输出层为4,输出为100*4。
步骤6、网络训练时,回归的损失函数由L iou和L1损失函数组成,L box=λ iou*L iouL1*L 1,经实验得出λ iou取2.0,λ L1取5.0效果比较理想。分类损失函数为交叉熵损失函数,最后将这个两个损失函数输入到匈牙利损失函数中,计算损失。匈牙利损失函数如下所示:
Figure PCTCN2021115494-appb-000004
其中c i是类别标签,
Figure PCTCN2021115494-appb-000005
为预测的类别,
Figure PCTCN2021115494-appb-000006
中c i(类别的标签)不能为空集,如果没有类别,那么就不需要计算后面的预测框回归。其中L box具体如下:
Figure PCTCN2021115494-appb-000007
其中
Figure PCTCN2021115494-appb-000008
为预测类别的预测框,b i为预测类别的真实边框,
Figure PCTCN2021115494-appb-000009
为c i类的概率。
步骤7、使用训练好的网络进行前向推理时,预测出瑕疵位置以及瑕疵的类别。

Claims (5)

  1. 一种基于深度学习网络的瑕疵点识别方法,其特征在于该方法包括以下步骤:
    步骤1、拍摄包含瑕疵点的视频图像序列,将其输入至特征提取分支进行特征提取;
    步骤2、将特征提取分支输出的特征图经过展平,通过添加位置信息分支加入位置编码信息,输入至transformer分支中的编码器transformer-encoder;
    所述的编码器transformer-encoder中的自注意力层Self-Attention进行了改进,具体是:在自注意力层Self-Attention中增加了一个用于增强局部性且基于位置的高斯分布Attention分支;
    同时还对输入至编码器transformer-encoder的Key和Value矩阵进行卷积降维,用于提升自注意力层Self-Attention的计算速度;
    步骤4、将编码器transformer-encoder的最后一层输出,作为transformer分支中解码器transformer-decoder的输入,并把可学习的矩阵object queries输入到解码器transformer-decoder中;
    步骤5、解码器transformer decoder的结果输入到前馈神经网络预测分支,分类分支和回归分支,其中分类分支通过全连接进行分类,回归分支通过多层感知机进行边界框的回归;
    步骤6、对网络进行训练,训练时,分类损失为交叉熵损失,回归损失包括L iou损失和L1损失;
    所述网络由特征提取分支,添加位置信息分支,transformer分支,前馈神经网络预测分支组成;
    步骤7、使用训练好的网络进行前向推理时,计算模型预测图像中的瑕疵点。
  2. 根据权利要求1所述的一种基于深度学习网络的瑕疵点识别方法,其特征在于:所述的特征提取分支采用Resnet-50网络。
  3. 根据权利要求2所述的一种基于深度学习网络的瑕疵点识别方法,其特征在于:卷积降维过程中采用3*3的卷积核,卷积步长为2,卷积核个数与输入通道一致。
  4. 根据权利要求1所述的一种基于深度学习网络的瑕疵点识别方法,其特征在于:高斯分布Attention分支的输入为Query矩阵和卷积降维后的Key矩阵。
  5. 根据权利要求1所述的一种基于深度学习网络的瑕疵点识别方法,其特征在于:回归的损失函数L box由L iou损失函数和L1损失函数组成,L box=2*L iou+5*L1。
PCT/CN2021/115494 2021-08-18 2021-08-31 一种基于深度学习网络的瑕疵点识别方法 WO2023019636A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/810,608 US11615523B2 (en) 2021-08-18 2022-07-03 Methods for recognizing small targets based on deep learning networks

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110947457.3 2021-08-18
CN202110947457.3A CN113673594B (zh) 2021-08-18 2021-08-18 一种基于深度学习网络的瑕疵点识别方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/810,608 Continuation US11615523B2 (en) 2021-08-18 2022-07-03 Methods for recognizing small targets based on deep learning networks

Publications (1)

Publication Number Publication Date
WO2023019636A1 true WO2023019636A1 (zh) 2023-02-23

Family

ID=78543487

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115494 WO2023019636A1 (zh) 2021-08-18 2021-08-31 一种基于深度学习网络的瑕疵点识别方法

Country Status (2)

Country Link
CN (1) CN113673594B (zh)
WO (1) WO2023019636A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503880A (zh) * 2023-06-29 2023-07-28 武汉纺织大学 一种倾斜字体的英文字符识别方法和系统
CN116823718A (zh) * 2023-04-17 2023-09-29 南通大学 一种基于深度学习的筒纱缺陷图像分类方法
CN116883409A (zh) * 2023-09-08 2023-10-13 山东省科学院激光研究所 一种基于深度学习的输送带缺陷检测方法及系统
CN117197727A (zh) * 2023-11-07 2023-12-08 浙江大学 一种基于全局时空特征学习的行为检测方法与系统
CN117191821A (zh) * 2023-11-03 2023-12-08 山东宇影光学仪器有限公司 一种基于Deformable-DAB-DETR的高透光菲涅尔透镜实时检测方法
CN117409264A (zh) * 2023-12-16 2024-01-16 武汉理工大学 基于transformer的多传感器数据融合机器人地形感知方法
CN117496131A (zh) * 2023-12-29 2024-02-02 国网山东省电力公司济南供电公司 一种电力作业现场安全行为识别方法及系统
CN117541554A (zh) * 2023-11-15 2024-02-09 江西理工大学 一种基于深度学习的表面缺陷检测方法
CN117994254A (zh) * 2024-04-03 2024-05-07 江苏兴力工程管理有限公司 一种基于条件交叉注意力机制的架空线路绝缘子定位识别方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953665B (zh) * 2023-03-09 2023-06-02 武汉人工智能研究院 一种目标检测方法、装置、设备及存储介质
CN117292243B (zh) * 2023-11-24 2024-02-20 合肥工业大学 基于深度学习的心磁信号时空图像预测方法、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190163904A1 (en) * 2017-05-24 2019-05-30 Estsecurity Corp. Apparatus for detecting variants of malicious code based on neural network learning, method therefor and computer readable recording medium storing program for performing the method
CN111260614A (zh) * 2020-01-13 2020-06-09 华南理工大学 一种基于极限学习机的卷积神经网络布匹瑕疵检测方法
CN111681228A (zh) * 2020-06-09 2020-09-18 创新奇智(合肥)科技有限公司 瑕疵检测模型、训练方法、检测方法、装置、设备及介质
CN111899224A (zh) * 2020-06-30 2020-11-06 烟台市计量所 基于深度学习注意力机制的核电管道缺陷检测系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11501076B2 (en) * 2018-02-09 2022-11-15 Salesforce.Com, Inc. Multitask learning as question answering
KR102342337B1 (ko) * 2019-01-24 2021-12-23 가천대학교 산학협력단 딥러닝 신경망을 이용한 디스플레이 패널 불량 진단 시스템 및 방법
CN112149619B (zh) * 2020-10-14 2024-03-15 南昌慧亦臣科技有限公司 一种基于Transformer模型自然场景文字识别方法
CN113240626B (zh) * 2021-04-08 2023-07-11 西安电子科技大学 一种基于神经网络的玻璃盖板凹凸型瑕疵检测与分类方法
CN113241075A (zh) * 2021-05-06 2021-08-10 西北工业大学 一种基于残差高斯自注意力的Transformer端到端语音识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190163904A1 (en) * 2017-05-24 2019-05-30 Estsecurity Corp. Apparatus for detecting variants of malicious code based on neural network learning, method therefor and computer readable recording medium storing program for performing the method
CN111260614A (zh) * 2020-01-13 2020-06-09 华南理工大学 一种基于极限学习机的卷积神经网络布匹瑕疵检测方法
CN111681228A (zh) * 2020-06-09 2020-09-18 创新奇智(合肥)科技有限公司 瑕疵检测模型、训练方法、检测方法、装置、设备及介质
CN111899224A (zh) * 2020-06-30 2020-11-06 烟台市计量所 基于深度学习注意力机制的核电管道缺陷检测系统

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823718A (zh) * 2023-04-17 2023-09-29 南通大学 一种基于深度学习的筒纱缺陷图像分类方法
CN116823718B (zh) * 2023-04-17 2024-02-23 南通大学 一种基于深度学习的筒纱缺陷图像分类方法
CN116503880A (zh) * 2023-06-29 2023-07-28 武汉纺织大学 一种倾斜字体的英文字符识别方法和系统
CN116503880B (zh) * 2023-06-29 2023-10-31 武汉纺织大学 一种倾斜字体的英文字符识别方法和系统
CN116883409B (zh) * 2023-09-08 2023-11-24 山东省科学院激光研究所 一种基于深度学习的输送带缺陷检测方法及系统
CN116883409A (zh) * 2023-09-08 2023-10-13 山东省科学院激光研究所 一种基于深度学习的输送带缺陷检测方法及系统
CN117191821A (zh) * 2023-11-03 2023-12-08 山东宇影光学仪器有限公司 一种基于Deformable-DAB-DETR的高透光菲涅尔透镜实时检测方法
CN117191821B (zh) * 2023-11-03 2024-02-06 山东宇影光学仪器有限公司 一种基于Deformable-DAB-DETR的高透光菲涅尔透镜实时检测方法
CN117197727A (zh) * 2023-11-07 2023-12-08 浙江大学 一种基于全局时空特征学习的行为检测方法与系统
CN117197727B (zh) * 2023-11-07 2024-02-02 浙江大学 一种基于全局时空特征学习的行为检测方法与系统
CN117541554A (zh) * 2023-11-15 2024-02-09 江西理工大学 一种基于深度学习的表面缺陷检测方法
CN117409264A (zh) * 2023-12-16 2024-01-16 武汉理工大学 基于transformer的多传感器数据融合机器人地形感知方法
CN117409264B (zh) * 2023-12-16 2024-03-08 武汉理工大学 基于transformer的多传感器数据融合机器人地形感知方法
CN117496131A (zh) * 2023-12-29 2024-02-02 国网山东省电力公司济南供电公司 一种电力作业现场安全行为识别方法及系统
CN117496131B (zh) * 2023-12-29 2024-05-10 国网山东省电力公司济南供电公司 一种电力作业现场安全行为识别方法及系统
CN117994254A (zh) * 2024-04-03 2024-05-07 江苏兴力工程管理有限公司 一种基于条件交叉注意力机制的架空线路绝缘子定位识别方法

Also Published As

Publication number Publication date
CN113673594A (zh) 2021-11-19
CN113673594B (zh) 2022-07-12

Similar Documents

Publication Publication Date Title
WO2023019636A1 (zh) 一种基于深度学习网络的瑕疵点识别方法
Weng et al. PTP: Parallelized tracking and prediction with graph neural networks and diversity sampling
CN110163286B (zh) 一种基于混合池化的领域自适应图像分类方法
CN112329690B (zh) 基于时空残差网络和时序卷积网络的连续手语识别方法
CN115496928B (zh) 基于多重特征匹配的多模态图像特征匹配方法
US11615523B2 (en) Methods for recognizing small targets based on deep learning networks
CN109522831B (zh) 一种基于微卷积神经网络的车辆实时检测方法
Nguyen et al. Boxer: Box-attention for 2d and 3d transformers
CN114724155A (zh) 基于深度卷积神经网络的场景文本检测方法、系统及设备
CN114663915A (zh) 基于Transformer模型的图像人-物交互定位方法及系统
CN117152416A (zh) 一种基于detr改进模型的稀疏注意力目标检测方法
Dian et al. Faster R-Transformer: An efficient method for insulator detection in complex aerial environments
CN114708297A (zh) 一种视频目标跟踪方法及装置
CN117058456A (zh) 一种基于多相注意力机制的视觉目标跟踪方法
CN114241515A (zh) 一种基于时空上下文特征感知的三维人体姿态估计方法
Song et al. JPV-Net: Joint point-voxel representations for accurate 3D object detection
Tan et al. Enhanced AlexNet with super-resolution for low-resolution face recognition
Zhang et al. A Defect Detection Model for Industrial Products Based on Attention and Knowledge Distillation
Wang et al. Scene uyghur recognition with embedded coordinate attention
Liang et al. Transformed dynamic feature pyramid for small object detection
Li et al. Refined division features based on Transformer for semantic image segmentation
CN114187569A (zh) 一种皮尔森系数矩阵与注意力融合的实时目标检测方法
Liu et al. MemFormer: A memory based unified model for anomaly detection on metro railway tracks
Xia et al. QNet: A quick deep neural network for real-time semantic segmentation
Li et al. Bagging R-CNN: Ensemble for Object Detection in Complex Traffic Scenes

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE