CN111783590A

CN111783590A - Multi-class small target detection method based on metric learning

Info

Publication number: CN111783590A
Application number: CN202010583655.1A
Authority: CN
Inventors: 王靖宇; 王叶子; 张科; 吴虞霖; 王霰禹; 张国俊; 苏雨; 王震
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-16

Abstract

The invention relates to a multi-class small target detection method based on metric learning, which is characterized in that the characteristic expression capability of deep learning and the similarity discrimination capability of metric learning are combined according to the identification characteristics of the multi-class small targets, and a novel deep neural network structure is designed. The method is characterized in that a fast RCNN Network structure combined with a characteristic Pyramid Network (FPN) is adopted to detect multi-class small targets based on the whole image data, a graph Network module is embedded in the Network to transmit and calculate the similarity information among all the regions in the image, a similarity measurement module based on triple loss is adopted at the rear end of the Network to distinguish the detail information among samples, the characteristic information of the small targets and the similarity relation among the targets are fully extracted, and the accuracy of the multi-class small target detection is improved.

Description

Multi-class small target detection method based on metric learning

Technical Field

The invention relates to a multi-class small target detection method based on metric learning, and belongs to the technical field of image processing.

Background

Target detection is the key research topic in the field of Computer vision, and currently, a target detection technology is widely applied to the fields of automatic industrial inspection, Medical imaging diagnosis, remote sensing Image analysis and the like (Khosravan N, Bagci U.S. 4ND: Single-shot Single-scale non-along detection [ C ]// International conference Medical Image Computing and Computer-Assisted acquisition Intervision (MICCAI). Springer, Cham,2018: 794-. The multi-class small object detection refers to a detection task (m.kasartan, z.wojna, j.murawski, et al.augmentation for small object detection [ J ]. arXiv prediction arXiv:1902.07296,2019.) in which an image has more than two different object classes, and the absolute size of each object is less than 32 × 32 pixels or the relative size is less than 0.1 times of that of the original image, and has wide application prospects in many fields. Small targets can acquire less characteristic information, and multiple classes can cause larger inter-class similarity and intra-class difference, so that the detection of the multiple classes of small-size targets becomes a research hotspot and difficulty in the field of target detection.

The method combines the characteristic information of the upper layer and the bottom layer by adopting a multi-scale characteristic combination method, enriches the semantic information of the bottom layer and keeps higher resolution ratio at the same time, and can effectively improve the positioning effect and the detection precision of the neural network on small-size targets (Zhengqiume, xylonite, WangfuFeng, traffic scene small target detection [ J/OL ] based on the improved convolutional neural network 1-9). However, only the spatial information of the feature map is considered, and the interrelationship between various targets is ignored, so that it is difficult to accurately classify small targets under the background of multi-class detection. Therefore, the method has important research significance and application value in exploring the technical way capable of realizing accurate positioning and classification of the multi-class small targets.

Disclosure of Invention

Technical problem to be solved

In a complex detection scene of multiple classes of small targets, due to the fact that few visual feature information and large similarity and intra-class difference between the classes enable the existing deep neural network which only considers the characteristics of the small targets to have poor detection effect, target missing detection and class confusion are caused, and therefore the capture of the mutual relation between the targets is as important as the characteristics of the targets. In order to avoid the defects of the prior art, the invention provides a multi-class small target detection method based on metric learning.

Technical scheme

A multi-class small target detection method based on metric learning is characterized by comprising the following steps:

step 1: constructing a multi-class small target data set: shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; establishing a classification criterion of the electronic components according to different types and packaging forms of the electronic components, and carrying out image annotation by adopting Labelme software to obtain an annotation file in an xml format; performing quantity expansion on the PCB image by adopting affine transformation to obtain a PCB image data set; making a PCB image data set and an xml label file into a VOC2007 data set format;

step 2: constructing a graph network module and embedding the graph network module into a ResNet101 network of a Faster RCNN, wherein the ResNet101 network comprises five convolution modules conv _1, conv _2, conv _3, conv _4 and conv _5, and the concrete steps are as follows: designing a similarity calculation function and a graph convolution layer structure, dividing an output characteristic graph of an upper layer convolution layer into grids with N small blocks, wherein each small grid is characterized as an input node of the graph network, namely X represents the output characteristic graph of the upper layer convolution layer, X is divided into N areas with equal size, and then the node X is a node_iRepresents the ith area, node X, of the N areas of the feature map_jRepresents the jth region in the feature map X, X in the formula_jDenotes by X_iAny one of the other N-1 regions;

calculating function f (X) by similarity_i,X_j) Obtaining an edge characteristic matrix Y containing similarity relation between nodes by transmitting node information_i：

Where C (x) is the normalization operation, the value is N; g (X)_j) To aim at X_jPerforming a convolution operation with a convolution kernel of 1 x 1;

node feature X_iAnd edge feature Y_iThe layers are input together to obtain a new node feature Z_i：

Z_i＝RELU(W_zY_i+X_i)

Wherein W_zTo help embed the parameter matrix of the matching dimension, the node feature Z at that time_iIncluding node X_iSelf characteristic information and node X_iRegion and other X_jRelevance information of the region;

calculating each region node as above to obtain N new node outputs Z_iForming a corresponding characteristic diagram Z, wherein the Z is equal to the input characteristic diagram X in size, and the characteristic diagram contains correlation information among the regions;

embedding three graph network modules into the first three convolution modules conv _1, conv _2 and conv _3 of the ResNet101 network respectively;

and step 3: designing a fast R-CNN structure combined with FPN, and applying a characteristic pyramid network FPN in ResNets 101: extracting output characteristic diagrams of the last residual block of the last four modules of the ResNet101 network, which are represented as C2, C3, C4 and C5; c2, C3, C4 and C5 layers respectively pass through 1 multiplied by 1 convolution kernel, feature maps of high-layer low-resolution strong semantic information are sampled through nearest neighbors to obtain feature maps with the same size as a lower layer, the feature maps are added with feature maps of low-layer high-resolution weak semantic information according to elements, and the feature maps are convolved by 3 multiplied by 3 to obtain P2, P3, P4 and P5 layers respectively; the P6 layer is obtained by taking a P5 layer by 0.5 times of sampling;

next, the RPN generates a series of region candidate frames in five feature layers, i.e., P2, P3, P4, P5, and P6, respectively, through an anchor mechanism thereof, and finally performs connection and fusion on prediction results of each layer; mapping each propusals generated by the RPN to a corresponding feature layer according to the area size of the propusals, and performing next ROIPooling pooling operation of the region of interest; extracting features of each Proposals by ROI Pooling, and outputting a Proposals feature map sample with a fixed size of 7 multiplied by 7;

after each characteristic pattern passes through two full-connection layers, respectively calculating through two tail branches of fast RCNN: classifying the specific categories by using a classification loss function; by means of L₁Obtaining the accurate position of each target after the loss finishes the frame regression operation; calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:

wherein i represents the subscript of each sample, N_clsAnd N_regAll are normalized parameters, and lambda is a balance parameter of the weight; l is_clsRepresents a classification loss; p is a radical of_iRepresenting the probability that the sample is predicted to be of a certain class,

is a label of the tagged real data; l is_regRepresents the regression loss of the bounding box, and is defined as Smooth_L1(t-t*)，Smooth_L1The definition of the function is

When the representative sample is a positive sample, i.e.

Is activated; t is t_i＝{t_x、t_y、t_w、t_hDenotes the pan scaling parameter of the propofol prediction box,

a translation scaling parameter representing real data corresponding to the Proposal;

and 4, step 4: constructing a similarity measurement module based on triple loss, and replacing a classification branch at the tail end of a Faster RCNN network; and selecting a triplet (a, p, n) by adopting a semi-difficult mining strategy, wherein a is a target anchor frame anchor, p is a sample positive similar to a, n is a sample negative different from a, and the triplet loss function is L ═ max (d (a, p) -d (a, n) + margin,0)

Selecting samples to satisfy d (a, p) < d (a, n) < d (a, p) + margin by taking the area of the semi-difficult case as a center;

designing a convolutional neural network on the basis, inputting the selected triples into three convolutional neural networks with the same structure and shared weights, and enabling the networks to learn discriminant characteristics which are enough to distinguish detail information among the classes through triplet loss and model training to obtain a similarity measurement module; the module is embedded into the rear end of a Faster R-CNN model to replace an original normalized index function classification structure, and label classification of the region of interest is carried out to obtain the classification of each target;

and 5: performing end-to-end training on the deep neural network obtained in the steps 2-4 on a training set and a verification set of a PCB data set, and performing forward propagation and backward propagation steps on each picture of an input neural network based on a loss function L ({ p } p_i}，{t_i}) updating the internal parameters of the model to obtain a multi-class small target detection model for detecting the electronic elements on the PCB image;

step 6: and inputting the test set of the PCB data set into the trained deep neural network model, and detecting the electronic element target of the PCB image.

The quantity expansion in step 1 includes random clipping, rotation and flipping.

And N is 1024.

Advantageous effects

The invention provides a multi-class small target detection method based on metric learning, aiming at the recognition characteristics of the multi-class small targets, the invention combines the feature expression capability of deep learning and the similarity discrimination capability of metric learning, and designs a novel deep neural network structure. The method is characterized in that a fast RCNN network structure combined with a characteristic pyramid network (FPN) is adopted to detect multi-class small targets based on the whole image data, a graph network module is embedded in the network to transmit and calculate the similarity information among all the regions in the image, a similarity measurement module based on triple loss is adopted at the rear end of the network to distinguish the detail information among samples, the characteristic information of the small targets and the similarity relation among the targets are fully extracted, and the accuracy of the multi-class small target detection is improved.

The method has the advantages that:

(1) through the second step of the invention, the relevance relation among all the regions of the image is calculated by adopting the image network module in a cross-region mode, the sensitivity of the network to the position of the target is enhanced, and the positioning performance of the target is improved.

(2) According to the third step of the invention, the FPN and the Faster RCNN are combined, and the multi-scale feature fusion can avoid the loss of the detail information of the small target, so that the characterization capability of the small target features is enhanced.

(3) According to the fourth step of the invention, the similarity measurement module based on triple loss is adopted to classify the ROI labels of various targets, and a Softmax layer with characteristic separability but insufficient discriminability is replaced, so that the network learns discriminant characteristics enough for distinguishing detailed information among the categories, and the classification accuracy of small targets is improved.

Drawings

FIG. 1 is a data set construction flow diagram

FIG. 2 is an algorithm flow chart

FIG. 3 is a diagram of a deep neural network architecture

FIG. 4 is a graph of test results

Detailed Description

The invention will now be further described with reference to the following examples and drawings:

the invention aims to provide a multi-class small target detection method based on metric learning, which is realized by the following technical scheme and comprises the following specific steps:

step one, constructing a multi-class small target data set. Taking an electronic element on a Printed Circuit Board (PCB) as a research object, establishing a PCB data set, and the specific process comprises the following steps: shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; secondly, establishing a classification criterion of the electronic components (namely category labels corresponding to the electronic components of different types and packaging forms) according to the types and the packaging forms of the electronic components, and carrying out image annotation by adopting Labelme software to obtain an annotation file in an xml format; secondly, performing quantity expansion on the PCB image by adopting affine transformation, wherein the quantity expansion comprises random cutting, rotation (90 degrees, 180 degrees and 270 degrees) and overturning (horizontal and vertical), and obtaining a PCB image data set; and finally, making the PCB image data set and the xml label file into a VOC2007 data set format.

And step two, constructing a graph network module and embedding the graph network module into a ResNet101 network of the fast RCNN. The backbone network used by the fast RCNN in the present invention is ResNet101, which is used to extract the features of PCB images. The design process of the graph network module is as follows: firstly, designing a similarity calculation function and a graph convolution layer structure, dividing an output characteristic graph of an upper layer convolution layer into grids with N small blocks, wherein each small grid is characterized as an input node of the graph network, namely X represents the output characteristic graph of the upper layer convolution layer, and X is divided into N areas (each area is equal in size), and then the node X is_iRepresents the ith area, node X, of the N areas of the feature map_jRepresents the jth region in the feature map X, X in the formula_jDenotes by X_iAny one of the other N-1 regions;

Where C (x) is the normalization operation and the value is N.

Z_i＝RELU(W_zY_i+X_i)

Wherein W_zTo help embed the parameter matrix of the matching dimension, the node feature Z at that time_iIncluding node X_iSelf characteristic information and node X_iRegion and other X_jRelevance information of the region.

Calculating each region node as above to obtain N new node outputs Z_iA corresponding profile Z (of the same size as the input profile X) is formed, which contains information about the correlation between the regions.

The graph network module does not change the resolution of the feature map, so as shown in fig. 3, three graph network modules are respectively embedded among the first three convolution modules (conv _1, conv _2 and conv _3) of the ResNet101 network, so that the sensitivity of the network to the position of the target is enhanced, and the target positioning performance is improved.

And step three, designing a fast R-CNN structure combined with the FPN, applying the characteristic pyramid network FPN to ResNets101, and performing parameter optimization aiming at target characteristics, thereby improving the detection efficiency of small targets. The concrete structure (as shown in fig. 3) is as follows: the output characteristic diagrams of the last residual block of the last four modules of the ResNet101 network are respectively extracted and are represented as C2, C3, C4 and C5. The C2, C3, C4 and C5 layers respectively pass through 1 multiplied by 1 convolution kernel, feature maps of high-resolution strong semantic information at a high layer are subjected to nearest neighbor upsampling to obtain feature maps with the same size as a lower layer, the feature maps are added with feature maps of high-resolution weak semantic information at a low layer according to elements, and the feature maps are subjected to 3 multiplied by 3 convolution to obtain P2, P3, P4 and P5 layers respectively. The P6 layer is obtained by taking a 0.5-fold sample of the P5 layer.

Next, a series of Region candidate frames (Region candidates) are generated by the Region candidate Network (RPN) in five feature layers of P2, P3, P4, P5, and P6 through an Anchor (Anchor) mechanism, and finally, prediction results of each layer are connected and fused. And respectively mapping each Proposals generated by the RPN to a corresponding feature layer according to the area size of the Proposals, and carrying out the next region of interest Pooling (ROI Pooling) operation. ROI Pooling extracted features for each of the propofol profiles, outputting 7 × 7 samples of the propofol profile at a fixed size.

After each characteristic pattern passes through two full-connection layers, respectively calculating through two tail branches of fast RCNN: classifying the specific categories by using a classification loss function; by means of L₁And obtaining the accurate position of each target after the loss finishes the frame regression operation. Calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:

wherein i represents the subscript of each sample, N_clsAnd N_regAre all normalized parameters, and lambda is a balance parameter of the weight. L is_clsIndicating a classification loss. p is a radical of_iRepresenting the probability that the sample is predicted to be of a certain class,

is a label of the marked real data. L is_regRepresents the regression loss of the bounding box, and is defined as Smooth_L1(t-t*)，Smooth_L1The definition of the function is

When the representative sample is a positive sample, i.e.

Is activated. t is t_i＝{t_x、t_y、t_w、t_hDenotes the pan scaling parameter of the propofol prediction box,

a pan scaling parameter representing the real data to which the propofol corresponds.

And step four, constructing a similarity measurement module based on triple loss, and replacing a classification branch at the tail end of the Faster RCNN network. And selecting a triplet (a, p, n) by adopting a semi-difficult mining strategy, wherein a is a target anchor frame anchor, p is a sample positive similar to a, n is a sample negative different from a, and the triplet loss function is L ═ max (d (a, p) -d (a, n) + margin,0)

With the area of the semi-difficult case as the center, the sample selection satisfies d (a, p) < d (a, n) < d (a, p) + margin.

And designing a convolutional neural network on the basis, respectively inputting the selected triplets into three convolutional neural networks with the same structure and shared weights, and enabling the networks to learn discriminant characteristics which are sufficient for distinguishing detailed information among the classes through triple Loss and model training to obtain a similarity measurement module. The module is embedded into the rear end of a Faster R-CNN model, replaces an original normalized exponential function (Softmax) classification structure, and performs label classification of a Region of interest (ROI) to obtain the classification of each target, so that the classification precision of small targets is improved.

And step five, finishing the overall design of the deep neural network based on the three steps, training the model and optimizing parameters by adopting the multi-class small target data set, and finally performing model test.

Referring to fig. 2, a basic flow of a multi-class small-object detection method based on metric learning of the present invention, electronic components on a Printed Circuit Board (PCB) have characteristics of diversified types and packaging forms, small visual area, and the like, and therefore, the detection of the electronic components on the PCB is taken as an example to illustrate a specific embodiment of the present invention, but the technical content of the present invention is not limited to the range, and the specific embodiment includes the following steps:

step one, constructing a multi-class small target data set. In this embodiment, electronic components on a PCB are used as research objects to establish a PCB data set, and the specific process (see fig. 1) is as follows:

shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; secondly, establishing a classification criterion of the electronic components (namely category labels corresponding to the electronic components of different types and packaging forms) according to the different types and packaging forms of the electronic components, carrying out image annotation by adopting Labelme software, annotating the positions of the electronic components and the corresponding category labels of each PCB image, obtaining an annotation file (json format) corresponding to each image, and converting the annotation file into an xml file format; secondly, performing quantity expansion on the PCB image by adopting affine transformation, wherein the quantity expansion comprises random cutting, rotation (90 degrees, 180 degrees and 270 degrees) and overturning (horizontal and vertical), and obtaining a PCB image data set; and finally, making the PCB image data set and the xml label file into a VOC2007 data set format and generating txt files of train, val and test.

Step two, building a deep neural network, training the deep neural network model by adopting a training set and a verification set of a PCB data set to obtain a multi-class small target detection model taking electronic elements on the PCB as objects, and describing the specific process by taking the input PCB image 1024 multiplied by 1024 as an example:

(1) a backbone network used by the FasterRCNN in the invention is ResNet101 which is used for extracting the characteristics of a PCB image, the ResNet101 network comprises five convolution modules (conv _1, conv _2, conv _3, conv _4 and conv _5), as shown in FIG. 3, the input PCB image 1024 × 1024 of the invention is taken as an example, the size of the characteristic image is 256 × 256 after passing conv _1, the characteristic image is taken as the input of a graph network module, the design process of the graph network module is shown, firstly, a similarity calculation function and a graph volume layer structure are designed, the output characteristic image of an upper layer volume layer is divided into grid with N small blocksEach small grid is characterized as an input node of the graph network, that is, X represents a 256 × 256 characteristic graph of conv _1 convolutional layer output, and X is divided into N-32 × 32-1024 regions (each region has a size of 8 × 8), so that the node X is represented as a node X_iThe i-th 8 × 8 size region, node X, of the 1024 regions representing the feature map_jRepresents the jth region in the feature map X, X in the formula_jDenotes by X_iAny 8 × 8 size area out of the remaining 1023 areas;

Where c (x) is the normalization operation, the value N is 1024.

Z_i＝RELU(W_zY_i+X_i)

The calculation is carried out on each region node (1024 total) to finally obtain the output Z of 1024 new nodes_i(and X_iCorresponding regions of equal size, 8 × 8 size) of the input feature map X (256 × 256 size, equal size to input feature map X), which contains information about the correlation between the regions.

(2) A fast R-CNN structure combined with FPN is designed, and a characteristic pyramid network FPN is applied to ResNets 101. The concrete structure (as shown in fig. 3) is as follows: the ResNet101 network includes five convolution modules (conv _1, conv _2, conv _3, conv _4, conv _5), and output feature maps of the last residual block of the last four modules are extracted and shown as C2, C3, C4, and C5. Taking the input PCB image 1024 × 1024 of the present invention as an example, the sizes of the characteristic diagrams from C2 to C5 are in turn: 256 × 256 × 256, 128 × 128 × 512, 64 × 64 × 1024, and 32 × 32 × 2048, the size of the C2 layer is reduced by 4 times, the C3 layer is reduced by 8 times, the C4 layer is reduced by 16 times, and the C5 layer is reduced by 32 times, respectively, as compared with the original image. C2, C3, C4 and C5 were subjected to 1 × 1 convolution kernels, respectively, to make the number of channels in each layer uniform to 256, but the size of the feature map was not changed. And (3) performing 2 times of nearest neighbor upsampling on the feature map of the high-resolution strong semantic information to obtain a feature map with the same size as the lower layer, and adding the feature map of the high-resolution weak semantic information of the lower layer by elements to respectively obtain P2 layers, P3 layers and P4 layers. And (3) weakening the aliasing effect of upsampling by passing each joint feature map (namely P2, P3 and P4) through a 3 x 3 convolution kernel to obtain final P2, P3 and P4 layers. The P5 layer is obtained directly without upsampling and 3 x 3 convolution operations. The P6 layers are obtained by 0.5-fold down-sampling the P5 layers, and have a size of 16 × 16 × 256.

Next, RPN generates a series of propofol by its Anchor mechanism in five feature layers of different sizes P2, P3, P4, P5, and P6, and each layer makes independent target candidate frame prediction. And finally, performing connection fusion on the prediction result of each layer. The RPN network structure is a 3 × 3 convolution layer and two convolution output branches: the probability that the candidate region is the target is output by the left branch; and the right branch outputs the coordinates of the upper left corner and the width and the height of a candidate area frame (bounding box). In the RPN training process, the cross-over ratio of the label frame is more than 0.7 as a positive label (target), and less than 0.3 as a negative label (background). Under the characteristic pyramid network FPN, the Anchor frame adopts three aspect ratios of 1:1, 2:1 and 1:2, and the side lengths of the Anchor frames of 5 prediction layers are respectively 32, 64, 128, 256 and 512 according to the size of an electronic element, so that a total of 15 anchors in different shapes are adopted.

Each of the Propusals boxes generated by the RPN is mapped to the corresponding one of the areas (w × h) according to the size of the areaCorresponding characteristic layer P_kThe next ROI Pooling procedure was performed. The k value is calculated as follows, where k₀4, w and h are the width and height of the bounding box:

(k Final value range: 2, 3, 4, 5)

The ROI Pooling extracts the characteristics of each Propusals, Proposals characteristic diagram samples with the fixed size of 7 × 7 are output, consistency of the sizes of the characteristics entering a full connection layer is guaranteed, after each characteristic diagram sample passes through two 1024d full connection layers, the characteristic diagram samples are respectively calculated through two tail branches of fast RCNN, classification loss functions are used for classification of specific categories to obtain the categories of electronic elements, L is used for classifying the categories of the electronic elements, and the characteristics of the electronic elements are obtained through the classification loss functions₁And obtaining the accurate position of each target after the loss finishes the frame regression operation. Calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:

When the representative sample is a positive sample, i.e.

(3) A similarity metric module based on triplet loss is constructed and replaces the classification branch at the end of the fast RCNN network, as shown in fig. 3. The selection strategy of the triples is crucial, the random selection of the positive and negative examples of the samples easily causes slow model convergence, and the mining of only the difficult examples easily causes model collapse. Therefore, a semi-difficult case mining strategy is adopted to select the triples (a, p, n), wherein a is a target Anchor frame Anchor, p is a sample Positive similar to a, and n is a sample Negative different from a, and then the triple loss function is as follows:

L＝max(d(a,p)-d(a,n)+margin,0)

and taking the area of the semi-difficult case as the center, selecting a non-homogeneous target with larger similarity to the sample as a negative sample pair, and selecting a homogeneous target with the minimum similarity to the sample as a positive sample pair, namely, satisfying d (a, p) < d (a, n) < d (a, p) + margin.

On the basis, a triple convolutional neural network is designed, and selected triples are respectively input into three convolutional neural networks with the same structure and shared weight, wherein the shared network consists of an input layer, a convolutional layer with two cores of 3 x 3, a maximum pooling layer and a full-connection layer. And (3) learning the discriminative characteristics which are enough for distinguishing the detail information among the classes by the network through triple Loss and model training, and obtaining the similarity measurement module. The original Softmax layer classification structure is replaced, the module is embedded into a classification branch at the tail end of a Faster R-CNN model, and classification of each target is obtained, so that the classification precision of small targets is improved.

(4) The three steps are performed on a training set and a verification set of a PCB databaseThe obtained deep neural network is trained end to end, and forward propagation and backward propagation steps are executed for each picture of the input neural network based on a loss function L ({ p_i}，{t_iAnd) updating the internal parameters of the model to obtain a multi-class small target detection model for detecting the electronic elements on the PCB image.

Step three, inputting a test set of a PCB data set as a test example into the trained deep neural network model, and detecting an electronic element target of the PCB image, wherein the specific process is as follows:

(1) inputting a group of PCB images to be tested, limiting the maximum side length of the input image to be 1024, and obtaining 400 candidate target regions Proposals in the image through RPN after feature extraction of ResNet network and FPN network;

(2) the ROI Pooling takes the original image feature map and each candidate target area as input, extracts the feature maps of the candidate target areas and outputs 7 multiplied by 7 feature maps with uniform sizes for next detection frame regression and target classification;

(3) the similarity measurement module based on triple loss takes the Proposal characteristics as input, extracts discriminant characteristics through a shared network and outputs corresponding target categories; and obtaining accurate rectangular position information of each target detection frame through regression of the characteristic information of the Proposal through the full connection layer and the frame. Finally, marking out all circumscribed rectangles marked as electronic element targets and the categories of the circumscribed rectangles in the original image;

(4) and evaluating the result by adopting the average precision AP and the average precision mAP. False Negative (FN): is judged as a negative sample, but is actually a positive sample; false Positive (FP): is judged as a positive sample, but is actually a negative sample; true Negative (tube Negative, TN): is determined to be a negative sample, and is in fact a negative sample; true case (TurePositve, TP): is determined to be a positive sample, and is actually a positive sample. The Precision (Precision) is TP/(TP + FP), the Recall (Recall) is TP/(TP + FN), and a two-dimensional curve with Precision and Recall as vertical and horizontal axis coordinates is the Precision-Recall (P-R) curve. The average precision AP of each category is the area enclosed by the P-R curves corresponding to the category, and the average precision average mAP is the average value of the AP values of each category.

Claims

1. A multi-class small target detection method based on metric learning is characterized by comprising the following steps:

node feature X_iAnd edge feature Y_iInputting the graph convolution layers together, fromTo obtain a new node characteristic Z_i：

Z_i＝RELU(W_zY_i+X_i)

next, the RPN generates a series of region candidate frames in five feature layers, i.e., P2, P3, P4, P5, and P6, respectively, through an anchor mechanism thereof, and finally performs connection and fusion on prediction results of each layer; mapping each Proposals generated by the RPN to a corresponding feature layer according to the area size of the Proposals, and performing the next ROI Pooling Pooling operation; extracting features of each Proposals by ROI Pooling, and outputting a Proposals feature map sample with a fixed size of 7 multiplied by 7;

When the representative sample is a positive sample, i.e.

and 4, step 4: constructing a similarity measurement module based on triple loss, and replacing a classification branch at the tail end of a Faster RCNN network; selecting a triplet (a, p, n) by adopting a semi-difficult case mining strategy, wherein a is a target anchor frame anchor, p is a sample positive similar to a, n is a sample negative different from a, and a triplet loss function is

L＝max(d(a,p)-d(a,n)+margin,0)

2. The method of claim 1, wherein the quantity expansion in step 1 comprises random cropping, rotation and flipping.

3. The method of claim 1, wherein N is 1024.