CN114792300A

CN114792300A - Multi-scale attention X-ray broken needle detection method

Info

Publication number: CN114792300A
Application number: CN202210098820.3A
Authority: CN
Inventors: 庞子龙; 何欣; 张晨龙; 武戈; 李�赫
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-07-26
Anticipated expiration: 2042-01-27
Also published as: CN114792300B

Abstract

The invention discloses a multi-scale attention X-ray broken needle detection method, which applies a multi-scale attention X-ray foreign matter detection algorithm to X-ray foreign matter detection according to the characteristics of industrial application, optimizes the structure of a residual error neural network, ensures that the detection speed and precision of a model in industrial application meet the industrial standard and ensures that the generalization capability of the model is strong; compared with a traditional X-ray metal foreign matter detection system, the deep learning-based multi-scale attention X-ray foreign matter detection algorithm can detect the product foreign matter containing metal, can identify other foreign matters according to the user requirements after retraining, can be rapidly deployed on an X-ray needle detector, and achieves rapid, full-automatic, data-saving and timely-feedback broken needle identification and detection.

Description

Multi-scale attention-based X-ray broken needle detection method

Technical Field

The invention belongs to the technical field of X-ray broken needle detection, particularly relates to a multi-scale attention-based X-ray broken needle detection method, and particularly relates to a foreign matter detection process of metal broken needles in production workshops such as textile shoes, clothing, bags and the like.

Background

X-ray imaging is widely applied to medical and security inspection equipment, needle inspection equipment similar to an X-ray security inspection machine is rarely applied to other manufacturers at present, but the core problems are that automatic detection similar to a metal needle inspection machine and automatic foreign matter alarming cannot be achieved, extra manual monitoring is needed, the missing inspection rate is high, the production rate of a production line cannot be matched, and therefore the X-ray imaging can only be used as offline spot inspection equipment of products. The traditional needle detection machine utilizes electromagnetic induction, when a ferromagnetic object enters a detection channel, an upper probe and a lower probe generate signals through strong magnetic field induction and send the signals to a signal receiving and processing integrated block, and then a photoelectric instrument scans induction, a strong current integrated block, a sensitivity control integrated block, a signal receiving integrated block, a signal output integrated block and an execution element to complete detection work. But if the product itself has metal, false detection may occur.

With the intense research of deep learning in the field of X-ray image processing, in recent years, the application of independently detecting a certain type of dangerous goods and contraband goods based on a deep convolutional neural network appears, more than 90% of SMTs are detected, the false alarm rate is lower than 6%, compared with the traditional method, the method is improved by one order of magnitude, and the capability exceeding manual detection is shown. If the automatic identification of foreign matters in the X-ray image can be realized, the automatic detection technology is improved in the fields of bags, toys, medicines, foods and the like, so that the online and real-time detection of a production line is realized. Therefore, the research on the X-ray image automatic analysis technology based on deep learning to realize the automatic identification and detection of the broken metal needle, and the automatic feature learning by using the deep neural network is the current research hotspot.

Disclosure of Invention

The invention provides a multi-scale attention X-ray broken needle detection method, which aims at solving the problems that X-ray detection in the existing industrial application can not realize automatic detection similar to a metal needle detector and automatic foreign body alarm, needs additional manual monitoring, has high omission ratio and can not match the production rate of a production line, and therefore, the X-ray broken needle detection method can only be used as off-line spot check detection equipment of products.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-scale attention X-ray broken needle detection method comprises the following steps:

step 1, data acquisition: collecting an X-ray foreign matter image data set for training a neural network by using an X-ray image acquisition card, carrying out data annotation, and carrying out data cleaning, wherein the annotation content comprises a foreign matter category label and position information; then classifying the data set according to the size, the type, the image size and the contrast of the foreign matters through exploratory data analysis, cleaning abnormal or difficult samples by data, removing missing values and noise data to obtain four-dimensional vector information of a foreign matter marking frame in the image, and dividing the data set into a training set and a testing set according to the proportion;

step 2, training an X-ray broken needle detection model:

step 21, image mixing, label smoothing and random geometric transformation are carried out on the data set image obtained in the step 1: random cutting, random expansion, random horizontal turning, random stretching and random angle rotation; random color transformation: random transformation of contrast, brightness, saturation and chroma;

step 22, constructing a Cascade R-CNN neural network model, and migrating the Cascade R-CNN pre-training weight by using a public ImageNet to obtain an initial weight of the neural network;

step 23, performing difference diagram three-channel processing on the preprocessed data set: taking a picture to be detected as a first channel; taking a template picture which is collected in advance and has no foreign matters as a second channel; carrying out weighted difference operation on the first channel and the second channel, and taking an operation result as a third channel;

step 24, inputting the difference map into a modified ResNest50+ FPN backbone network in Cascade R-CNN, and extracting the characteristics of the image to be detected;

step 25, generating an anchor box by using the candidate area network, cutting and filtering the anchor box, judging whether the anchor box is a foreign matter by a Softmax classifier, and correcting the anchor box by using a frame regression algorithm to form a candidate frame window; mapping the candidate frame window to the last layer of convolution characteristic graph of the CNN, eliminating overlapped frames through a non-maximum suppression algorithm, and further screening candidate frames; generating a characteristic diagram with a fixed size for each RoI through a RoI pooling layer; performing regression joint training on the classification probability and the frame by utilizing Softmax Loss and Smooth L1 Loss; finally, calculating an intersection ratio IoU with the actual frame, comparing with a preset threshold value, and screening to obtain four-dimensional information of the foreign matter detection frame meeting the conditions;

step 26, setting IoU threshold values output each time to be gradually increased, and finally obtaining detection frames meeting conditions, wherein the output values are four-dimensional information and IoU values of the foreign matter detection frames;

step 27, repeating the steps for iteration and back propagation, continuously optimizing the weight of the training neural network to obtain an optimized Cascade R-CNN neural network, inputting the optimized Cascade R-CNN neural network into a test set, and considering that the training of the Cascade R-CNN neural network is finished when the cross-over ratio of the test set reaches a final target threshold;

and 3, applying the Cascade R-CNN neural network model obtained in the step 2 to an industrial X-ray needle detection procedure: inputting the image into a Cascade R-CNN neural network model for detection, inputting the output detection frame into a classifier of ImageNet to obtain the category of the foreign matter, outputting the category together with the four-dimensional information of the foreign matter detection frame obtained in the Cascade R-CNN, and finally finishing the detection;

and 4, displaying the foreign matter detection frame represented by the four-dimensional information on the original image, outputting the foreign matter detection frame and saving the foreign matter detection frame in a file.

Further, hard sample mining is performed by an OHEM algorithm.

Further, image blending is performed as follows:

img _mixup ＝lam*img _org +(1-lam)*img _mix

wherein lam is a hyper-parameter and is generated by a random number; img _org The original image is represented; img _mix Representing a hybrid graph; img _mixup The mixed image, i.e., the composite image, is shown.

Further, the label smoothing is performed as follows:

q′(k|x)＝(1-∈)δ _k，y +∈u(k)

wherein the epsilon is a hyper-parameter; u (k) represents one class.

Further, a backbone network of Cascade R-CNN is constructed as follows:

changing two convolution attention modules of Cardinal candidates in a ResNest50 network into three convolution attention modules, sequentially passing through a series of conv and relu feature maps after a picture is input into the network, inputting a plurality of feature maps generated by FPN into a candidate area network one by one, and additionally adding a conv + relu layer.

Further, the loss function of the Cascade R-CNN neural network model is as follows:

wherein, for each anchor, N _cls Is the number of classes, whose value in the binary task is 2; l is a radical of an alcohol _cls Is a softmax Loss of class two;

indicating a probability of being a foreign object; p is a radical of formula _i Representing the probability of not being a foreign object; t is t _i Represents the 4 coordinate positions of the anchor, i represents the ith anchor, when the anchor is a positive sample

If the sample is negative, the value is 0;

representing a group treuth box coordinate associated with the positive sample anchor, each group t contains fourThe dimension vectors x, y, w and h respectively represent the center coordinate and the width and the height of the box; n is a radical of hydrogen _reg The regression frame number is obtained by a frame regression algorithm; wherein L is _reg Is Smooth L1 Loss:

further, the candidate frame window is generated as follows:

mapping the characteristic points back to the central point of the receptive field of the original image to serve as a reference point, and then selecting k anchors with different scales and aspect ratios around the reference point; and for each anchor, performing Softmax regression and secondary classification on the anchor through a convolutional neural network, and judging the anchor to be a positive sample or a negative sample.

Further, the positive and negative samples are determined as follows:

for all anchors of each image in the training set, recording the anchor with the largest overlapping proportion with each calibrated ground truth box area as a positive sample; if the overlapping proportion of the residual anchor and a calibrated ground truth box area is more than 0.7, marking as a positive sample; if the overlap ratio of the area of the ground truth box with any one of the calibrated ground truth boxes is less than 0.3, the area is marked as a negative sample.

Compared with the prior art, the invention has the following beneficial effects:

according to the characteristics of industrial application, the multi-scale attention X-ray foreign matter detection algorithm is applied to X-ray foreign matter detection, the structure of a residual error neural network is optimized, the detection speed and the detection precision of the model in industrial application meet the industrial standard through improvement, and the generalization capability of the model is strong; compared with a traditional X-ray metal foreign matter detection system, the deep learning-based multi-scale attention X-ray foreign matter detection algorithm can detect the product foreign matter containing metal, can identify other foreign matters according to the user requirements after retraining, can be rapidly deployed on an X-ray needle detector, and achieves rapid, full-automatic, data-saving and timely-feedback broken needle identification and detection.

Drawings

FIG. 1 is a basic flowchart of a multi-scale attention X-ray broken needle detection method according to an embodiment of the present invention;

fig. 2 is a network structure diagram of a multi-scale attention X-ray broken needle detection method according to an embodiment of the present invention;

fig. 3 is a diagram of a network structure of a resest 50 according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

as shown in fig. 1, a method for detecting broken needles based on multi-scale attention X-ray includes:

and S101, collecting data.

And (3) using an X-ray image acquisition card to acquire an X-ray foreign matter image data set for training a neural network, performing data annotation, wherein the annotation content comprises a category label and position information of foreign matters, and manually performing data cleaning. Then roughly classifying the Data set according to the size, the type, the image size, the contrast and the like of foreign matters through Exploratory Data Analysis, adopting Data cleaning on abnormal or difficult samples (particularly performing difficult sample mining through an OHEM algorithm), removing missing values and noise Data to obtain four-dimensional vector information (x, y, w, h) of a foreign matter marking frame in the image, and enabling the Data set to be classified according to 4: the ratio of 1 is divided into a training set and a test set.

The X-ray image acquisition card is used for acquiring the X-ray image of an object to be detected as a training set, labellimg is used for labeling the objects acquired in multiple batches, and labeling information comprises edge frames Bounding Box (X, y, w, h) and class labels of each edge frame.

Checking the data set, carrying out manual data cleaning on missing values and noise data, and finally carrying out data cleaning on the data set according to the following steps of 4: 1 dividing training set and test set.

Specifically, before step S101, the method further includes: and (3) constructing an algorithm running environment and a hardware environment, and installing CUDA 11.1, Python 3.7 and Pytrch 1.8.0 configured with a GPU version on the ubuntu system.

And S102, training an X-ray broken needle detection model.

Step S102.1: and (4) carrying out image mixing (Mix-up), Label Smoothing (Label Smoothing) and random geometric transformation (all setting ranges and probabilities) on the data set image obtained in the step (S101): random cropping, random expansion, random horizontal flipping, random stretching (random difference), and random angular rotation; random color transformation: random transformation of contrast, brightness, saturation, and chromaticity.

Preprocessing data of the training set, wherein the data comprises image mixing (Mix-Up), and the specific method is that the image is recorded as img _org The mixed graph is img _mix The mixed image is img _mixup For original drawing (img) _org ) And mixed graph (img) _mix ) For each channel, a fixed-scale linear transformation (pixel-by-pixel addition) is performed on the two, and the formula is:

img _mixup ＝lam*img _org +(1-lam)*img _mix

lam is a hyper-parameter, generated by a random number, and conforms to the beta distribution with the parameter (alpha ), and alpha is taken as 1 by default. Then for the composite map, its loss function is transformed to:

loss _mixup ＝lam*loss _org +(1-lam)*loss _mix

therein, Loss _org And loss _mix The classification losses of the original picture and the mixed picture are respectively a cross entropy loss function, and the formula is as follows:

where y is the true value of the picture,

the predicted value of the picture output in the model, and N is the classification number.

The synthetic image was subjected to Label smoothing (Label Smo)othing), for the category label y _i The method comprises the following steps:

y′ _i ＝(1-∈)y _i +∈u(k)

replacing originally Dirac distributed tag variables with (1- ∈) Dirac functions, with probability ∈, the random variables distributed in u (k) (u (k)) being one class-to-one; the epsilon is a hyper-parameter, a value of the hyper-parameter is obtained by multiple experiments, the value is 0.1 through the experiments, and the updated label vector is used for replacing the 0, 1 label vector of the traditional code; the label smoothing is used as a regular term, so that overfitting can be effectively prevented, and the model is more adaptive to difficult samples.

Then, other data enhancements are performed: random cropping, random spreading, random horizontal flipping, random stretching (random difference), and random angular rotation. After the above operations are performed, the images are collectively converted into (256 ) -size images, and data enhancement is completed.

Step S102.2: and constructing a Cascade R-CNN neural network model, and migrating the Cascade R-CNN pre-training weight by using public ImageNet to obtain the initial weight of the neural network.

Specifically, the X-ray image of the public data set ImageNet can be adopted for transfer learning, the neural network model is generated by starting training through the pre-training model as a check point to realize the support of a new task, or parameters of other Cascade R-CNN neural network models are directly loaded, parameter transfer is carried out on the basis of the parameters, and characteristic fine adjustment is carried out.

Step S102.3: carrying out difference diagram three-channel processing on the preprocessed data set, wherein the difference diagram three-channel is formed by the following method:

taking a picture to be detected as a first channel; taking a template picture which is collected in advance and has no foreign matters as a second channel; and carrying out weighted difference operation on the first channel and the second channel, wherein the operation result is used as a third channel, and the weight of the difference operation is obtained by experiments. It should be noted that the convolutional neural network for extracting features in the present invention uses an open-source X-ray image data set (i.e., public ImageNet) to perform training and weighting, and has a better feature extraction capability preliminarily. Since the input layer is adjusted to be three channels of the difference map, the difference map is used for feature fine adjustment of the data set before feature extraction.

Step S102.4: and inputting the difference map into a modified backbone network of ResNest50+ FPN in the Cascade R-CNN, and extracting the characteristics of the image to be detected.

The backbone Network is a ResNest50+ Feature Pyramid Network (Feature Pyramid Network), the Network is modified according to the characteristics of ResNest for industrial application, two convolution modules of Cardinal candidates are changed into three, after the images are input into the Network, the images sequentially pass through a series of conv and relu to obtain Feature maps, a plurality of Feature maps generated by FPN are input into the candidate area Network one by one, a conv + relu layer is additionally added, and 51 + 39-dimensional features are output. Ready for subsequent use to select the propofol and at this point the coordinates can still be mapped back to the artwork.

Inputting the difference graph into a modified ResNest50+ FPN backbone neural network in a Cascade R-CNN neural network model, extracting features through multiple convolution pooling to obtain a feature graph, and calculating the residual block of each layer according to the formula:

x _l+1 ＝x _l +F(x _l ,W _l )

in a modification based on the reseest network, a decentralized attention mechanism is added:

S(x)＝F ₁ (x _l ,W _l )+F ₂ (x _l ,W _l )

aiming at small foreign matters in industrial level, the invention increases the modules of convolution attentions of distraction from two to three, divides the characteristics into a plurality of low-dimensional embedding, and aggregates the low-dimensional embedding to obtain S _c (x) And the deep semantic features can be obtained better:

S _c (x)＝F ₁ (x _l ,W _l )+F ₂ (x _l ,W _l )+F ₃ (x _l ,W _l )

wherein F1, F2, F3 are convolution neural network convolution blocks respectively, convolution operation is carried out for each low-dimensional embedding to obtain a feature map, and then for output x _L The final calculation formula is:

for the FPN network, the scaling of the layer number of each layer of feature pyramid from bottom to top is set

scale(f1,f2,f3,f4)＝(1/4,1/8,1/16,1/32)

The final output profile is characterized 51 x 39 x 256 and is scaled by the record vector spatial scale record for subsequent mapping back to the original size.

Step S102.5: generating an anchor Box by using a Region candidate Network (RPN), cutting and filtering the anchor Box, judging whether the anchor Box is a foreign matter by a Softmax classifier, and correcting the anchor Box by using a Bounding Box Regression (Bounding Box Regression) algorithm to form a more accurate window of the candidate Box. Mapping the candidate frame window to the last layer of convolution feature map of CNN, eliminating the overlapped frame by Non-Maximum Suppression (NMS) algorithm, and further screening the candidate frame; generating a fixed-size profile for each RoI by RoI Pooling (RoI Pooling); the classification probability and Bounding box regression (Bounding box regression) are jointly trained using Softmax Loss and Smooth L1 Loss. And finally, IoU (intersection ratio) of the actual frame and the frame is calculated, the actual frame and the initial threshold value are compared, and screening is carried out to obtain the four-dimensional information of the foreign matter detection frame which meets the conditions.

The loss function of the Cascade R-CNN neural network model is as follows:

wherein, for each anchor, N _cls Is the number of classes, which has a value of 2, L in the binary task _cls Is a softmax of two classes, with 2 score outputs to indicate the probability of being a foreign object

And not foreignProbability (p) _i ) Then, the regressor output of a bounding box is connected again to represent 4 coordinate positions (t) of the anchor _i ) And i denotes the ith anchor, when the anchor is a positive sample

Negative samples are 0.

Representing a ground truth box coordinate associated with the positive sample anchor, each set t contains four-dimensional vectors x, y, w, h, representing the box's center coordinate and width and height, respectively. Wherein N is _reg Is the number of regression boxes, L, obtained by a frame regression algorithm _reg The method comprises the following steps:

the Cascade R-CNN neural network structure has a gradient descent algorithm optimizer of Adam, and Cosine Annealing learning rate (Cosine Annealing) is used:

wherein

And

represents a range of learning rate variation; t is a unit of _cur Representing how many epochs pass after the last restart; t is _i Representing how many epochs in total need to be trained for the ith restart.

Further, step S102.5 comprises:

step S102.5.1: and inputting the feature map into a candidate area network according to the extracted features, and calculating to preliminarily obtain an extracted candidate frame.

The specific process implemented by the candidate area network algorithm is as follows:

for the input feature map, the feature map is scaled to 800 × 600, k anchors (default k is 9) are set for each point of the image after passing through the fifth convolutional layer, the feature point is mapped back to the center point of the receptive field of the original image to serve as a reference point, then k anchors of different scales and aspect ratios are selected around the reference point, each anchor is divided into positive and negative, each anchor has (x, y, w, h) 4 offsets corresponding to 9 rectangles of the upper left corner and the lower right corner coordinates of the detection frame respectively, and 3 shapes are shared, and the aspect ratio is about {1:1,1:2,2:1 }. For each anchor, Softmax regression and two classifications are carried out through a convolutional neural network, and the anchor is judged to be positive (positive sample) or negative (negative sample).

Candidate area networks, for the partitioning of positive and negative samples: examining all anchors (N M k) of each image (containing the manually-calibrated group truth box) in the training set, and marking the anchor with the largest overlapping proportion with each calibrated group truth box region as a positive sample (ensuring that each group true corresponds to at least one positive sample anchor); the remaining anchors, if the overlap ratio with a certain calibration area is greater than 0.7, are marked as positive samples (each ground transistor box may correspond to multiple positive samples, but each positive sample anchor may correspond to only one positive sample channel); if the overlapping proportion of the two marks is less than 0.3, the mark is negative.

For the generated multiple Positive anchors, in order to perform further screening, a Bounding Box Regression (edge Box Regression) algorithm is adopted, and the specific flow is as follows:

a four-dimensional vector { x, y, w, h } is typically used for the window, representing the center coordinates and width and height, respectively, of the window. Given the candidate anchor a and the marker anchor GT:

A＝(A _x ,A _y ,A _w ,A _h ),GT＝[G _x ,G _y ,G _w ,G _h ]

given the mapping of A to GT:

G′ _x ＝A _w *d _x (A)+A _x

G′ _y ＝A _h *d _y (A)+A _y

linear regression is used to model and fine tune the window, learning a set of parameters W for a given set of four input feature vectors _* The Loss function is set as:

wherein

Representing the true value;

optimizing the target:

the value of the X after linear regression is close to the true value GT, and the translation and scaling characteristic vector of the X is finally obtained, so that the regression parameter aiming at the Positive Anchor is achieved, and the X is approximate to the GT. Only when GT is relatively close to the position of the required regression frame, the above linear transformation is considered to be established by approximation.

Step S102.5.2: and sequencing the anchors subjected to Bounding Box Regression in the step S102.4 according to the confidence degrees, limiting the positive anchors exceeding the image boundary as the image boundary, preventing the positive anchors exceeding the image boundary in the subsequent ROI Powing, removing the positive anchors with very small size, removing redundant prediction objects by using an NMS algorithm, and outputting the prediction objects. Where the IoU threshold for the NMS algorithm was experimentally set to 0.45, the Loss function is:

L _nms ＝-log _p u

i.e., the log loss for the real class u, and p is the predicted probability for the C classes. Which in practice amounts to an increase in classification errors.

Step S102.5.3: using the RoI Pooling layer, collect the proseal, map it back to the feature map scale of (M/16) x (N/16) size using the spatial _ scale parameter; dividing the feature map region corresponding to each pro posal into grids of [ porous _ w + porous _ h ]; each of the grids was subjected to max pooling treatment. And calculates the proxy feature maps, and sends the maps to the subsequent network.

Step S102.5.4: calculating the category to which each propofol belongs specifically (in the experiment, dividing foreign matters into nine categories) by using the obtained propofol feature maps through the full connection layer and the softmax, and outputting the classified probability vectors; and simultaneously, obtaining the position offset of each proposal by using the bounding box regression again, and using the position offset for regression of a more accurate target detection frame. And finally, outputting the position information, the confidence coefficient and the type of the foreign matters of the generated detection frame.

Step S102.6: setting the output value of the step S102.5.4 as an input value, repeating the steps before the step S102.5.4 again, and setting IoU threshold values output each time to be gradually increased to be 0.5, 0.6 and 0.7 respectively, and finally obtaining a detection box meeting the conditions, wherein the output values are the four-dimensional information and IoU values of the foreign matter detection box.

Step S102.7: and repeating the steps to carry out iteration and back propagation, continuously optimizing the weight of the training neural network, iterating for multiple times to obtain the Cascade R-CNN neural network with good weight, inputting the Cascade R-CNN neural network into the test set, and considering that the training of the Cascade R-CNN neural network is finished when the cross-over ratio of the test set reaches a final target threshold value. The network parameters obtained in the iteration 240 rounds in a plurality of experiments meet the requirements of industrial application.

Step S103: the Cascade R-CNN neural network model was used for detection.

And (3) applying the neural network model with good Cascade R-CNN training parameters obtained in the step (S102) to an industrial X-ray needle detection procedure: and inputting the image into the Cascade R-CNN neural network model obtained in the step S102, detecting, inputting the output detection frame into a classifier of ImageNet to obtain the category of the foreign matter, outputting the category together with the four-dimensional information of the foreign matter detection frame obtained in the Cascade R-CNN, and finally finishing detection.

The classifier of step S103 classifies only the foreign objects in the generated candidate frame, and the classification loss function is:

as shown in fig. 1, which is a whole needle detection process of an X-ray image, an object to be detected is firstly placed in an X-ray machine, the X-ray image to be detected is collected and input to an image analysis system, the image is preprocessed in a unified manner, and then the preprocessed image is input to a trained caccode R-CNN neural network model, so that a four-dimensional vector and a label vector of a foreign matter label frame are obtained and displayed on a display system, and the detection is completed.

Step S104: the foreign object detection frame indicated by the four-dimensional information is displayed on the original image, and is output and saved.

Further description is as follows:

step S102.4, a convolution neural network is used as a feature extractor, the loaded image is continuously convoluted and downsampled to finish forward propagation, and the original features of the input image under different sizes are extracted to obtain a feature map; the convolutional neural network has the advantages of high fault tolerance, strong self-learning capability and the like, and also has the advantages of weight sharing, automatic feature extraction and the like. The method has great advantages in the field of image recognition and target detection.

In step S102.4, as shown in fig. 2 and fig. 3, a modified resest 50 network is adopted, a decentralized attention mechanism is adopted to train a neural network, and on the basis of BottleNeck, the convolutional layer is divided into several different sub-modules according to differences of receptive fields, so as to form a sliced attention supervision mechanism to increase context information of features for enhancing feature expression.

Step S102.4 introduces a feature pyramid network structure, and the principle thereof is that, under a single picture view, through multi-scale change, feature extraction is performed on an image of each scale (three different scales are adopted in the invention), multi-scale feature representation is generated, the sampled results are connected and fused, fusion features are obtained, so that multi-dimensional feature expression of the image is generated on the image, the feature semantic information of lower layers is less, but an accurate target position can be provided, accurate positioning is provided for rich feature semantic information of upper layers, a feature image with stronger expression capability is generated, in order to eliminate aliasing effect of up-sampling, each fusion result is convolved (the convolution kernel size is 3 × 3) after fusion, and finally, the prediction effect is achieved.

In step S102.5, the prediction box is generated by:

firstly, on the scale of an original image, for each feature image determining the extracted feature and sampling different depths, setting different numbers of candidate anchor frame sets according to the sampling depth, setting nine candidate frames with different sizes on each set, wherein each anchor frame has four-dimensional feature vectors.

Classifying and frame regression are carried out on the feature vector of each anchor frame, and summarizing is carried out, so that preliminary screening and preliminary deviation are achieved, the confidence value of each type of candidate frame is obtained through normalization, comparison is carried out on the confidence value and the initial intersection and comparison threshold value, the anchor frames higher than the threshold value are used as the candidate frames to be stored, the anchor frames lower than the threshold value are directly deleted, the purpose of screening the candidate frames is achieved, the position of the object is further calculated, and finally a group of bounding box lists are obtained.

It is worth mentioning that:

the difference image three-channel processing is carried out on the input image, the characteristics of the foreign matters are more obvious through a Cascade R-CNN neural network model after pre-training parameters, the characteristic pyramid fuses the multilayer characteristics, the effect of deepening the characteristics of the foreign matters is also achieved, and learning is carried out through back propagation.

The confidence threshold value can be set according to the requirements of a user, and for items with lower detection precision requirements, a low threshold value is selected to improve the detection speed; the method has the advantages that the detection precision requirement of the project is high, the threshold value is increased, the detection precision is improved, and the method is flexible and convenient.

In the scheme, the difference image processing is carried out on the X-ray image, the gray level processing of the to-be-detected image is obtained through the processing, the processed template image and the to-be-detected image are subjected to difference operation to obtain the difference image, and the obtained difference image, the template image subjected to the gray level processing and the to-be-detected image form a three-channel input neural network.

In the scheme, the Cascade R-CNN is used as a target detection model, the Cascade R-CNN has various network structures, is very flexible to use, can make up for deficiencies according to different project requirements, exerts the advantages of different detection networks, is based on a Pythrch framework, and is easier to transplant. While maintaining a very lightweight model size, is comparable in accuracy to the Cascade R-CNN benchmark.

In conclusion, according to the characteristics of industrial application, the multi-scale attention X-ray foreign matter detection algorithm is applied to X-ray foreign matter detection, the structure of a residual error neural network is optimized, the detection speed and the detection precision of the model in industrial application meet the industrial standard through improvement, and the generalization capability of the model is strong; compared with a traditional X-ray metal foreign matter detection system, the deep learning-based multi-scale attention X-ray foreign matter detection algorithm can detect the product foreign matter containing metal, can identify other foreign matters according to the user requirements after retraining, can be rapidly deployed on an X-ray needle detector, and achieves rapid, full-automatic, data-saving and timely-feedback broken needle identification and detection.

While only the preferred embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention.

Claims

1. A multi-scale attention X-ray broken needle detection method is characterized by comprising the following steps:

step 1, data acquisition: collecting an X-ray foreign matter image data set for training a neural network by using an X-ray image acquisition card, carrying out data annotation, and carrying out data cleaning, wherein the annotation content comprises a foreign matter category label and position information; then, classifying the data set according to the size of the foreign matters, the type of the foreign matters, the size of the image and the contrast through exploratory data analysis, cleaning abnormal or difficult samples by data, removing missing values and noise data to obtain four-dimensional vector information of a foreign matter labeling frame in the image, and dividing the data set into a training set and a testing set according to the proportion;

step 2, training an X-ray broken needle detection model:

step 21, image mixing, label smoothing and random geometric transformation are carried out on the data set image obtained in the step 1: random cutting, random expanding, random horizontal turning, random stretching and random angle rotation; random color transformation: random transformation of contrast, brightness, saturation and chroma;

step 23, performing difference diagram three-channel processing on the preprocessed data set: taking a picture to be detected as a first channel; taking a pre-collected template picture without foreign matters as a second channel; carrying out weighted difference operation on the first channel and the second channel, and taking an operation result as a third channel;

step 25, generating an anchor box by using the candidate area network, cutting and filtering the anchor box, judging whether the anchor box is a foreign matter by using a Softmax classifier, and correcting the anchor box by using a frame regression algorithm to form a candidate frame window; mapping the candidate frame window to the last layer of convolution characteristic graph of the CNN, eliminating overlapped frames through a non-maximum suppression algorithm, and further screening candidate frames; generating a characteristic diagram with a fixed size for each RoI through a RoI pooling layer; performing regression combination training on the classification probability and the frame by utilizing Softmax Loss and Smooth L1 Loss; finally, calculating an intersection ratio IoU with the actual frame, comparing with a preset threshold value, and screening to obtain four-dimensional information of the foreign matter detection frame meeting the conditions;

2. The multi-scale attention X-ray needle breakage detection method as claimed in claim 1, wherein in the step 1, hard sample mining is performed through an OHEM algorithm.

3. The multi-scale attention X-ray needle breakage detection method as claimed in claim 1, wherein image mixing is performed as follows:

img _mixup ＝lam*img _org +(1-lam)*img _mix

wherein lam is a hyper-parameter and is generated by a random number; img _org The original image is represented; img _mix Represents a hybrid graph; img _mixup The mixed image, i.e., the composite image, is shown.

4. The multi-scale attention X-ray needle breakage detection method as claimed in claim 1, wherein the label smoothing is performed as follows:

q′(k|x)＝(1-∈)δ _k，y +∈u(k)

wherein epsilon is a hyper-parameter; u (k) represents a class division.

5. The multi-scale attention X-ray broken needle detection method based on claim 1, characterized in that a backbone network of Cascade R-CNN is constructed as follows:

changing two convolution attention modules of Cardinal candidates in a ResNest50 network into three convolution attention modules, sequentially passing through a series of conv and relu feature maps after the pictures are input into the network, inputting a plurality of feature maps generated by FPN into a candidate area network one by one, and additionally adding a conv + relu layer.

6. The multi-scale attention X-ray broken needle detection method as claimed in claim 1, wherein the loss function of the Cascade R-CNN neural network model is as follows:

indicating a probability of being a foreign object; p is a radical of _i Representing the probability of not being a foreign object; t is t _i Represents the 4 coordinate positions of the anchor, i represents the ith anchor, when the anchor is a positive sample

If the sample is negative, the value is 0;

represents one and positiveThe group channel box coordinate related to the sample anchor, each group t contains four-dimensional vectors x, y, w and h, and the four-dimensional vectors x, y, w and h respectively represent the center coordinate and the width and the height of the box; n is a radical of hydrogen _reg The regression frame number is obtained through a frame regression algorithm; wherein L is _reg Is Smooth L1 Loss:

7. the multi-scale attention X-ray broken needle detection method as claimed in claim 1, wherein the frame candidate window is generated as follows:

8. The multi-scale attention X-ray broken needle detection method as claimed in claim 7, wherein the positive and negative samples are judged as follows: