CN111783590A - Multi-class small target detection method based on metric learning - Google Patents

Multi-class small target detection method based on metric learning Download PDF

Info

Publication number
CN111783590A
CN111783590A CN202010583655.1A CN202010583655A CN111783590A CN 111783590 A CN111783590 A CN 111783590A CN 202010583655 A CN202010583655 A CN 202010583655A CN 111783590 A CN111783590 A CN 111783590A
Authority
CN
China
Prior art keywords
network
node
layer
loss
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010583655.1A
Other languages
Chinese (zh)
Inventor
王靖宇
王叶子
张科
吴虞霖
王霰禹
张国俊
苏雨
王震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202010583655.1A priority Critical patent/CN111783590A/en
Publication of CN111783590A publication Critical patent/CN111783590A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a multi-class small target detection method based on metric learning, which is characterized in that the characteristic expression capability of deep learning and the similarity discrimination capability of metric learning are combined according to the identification characteristics of the multi-class small targets, and a novel deep neural network structure is designed. The method is characterized in that a fast RCNN Network structure combined with a characteristic Pyramid Network (FPN) is adopted to detect multi-class small targets based on the whole image data, a graph Network module is embedded in the Network to transmit and calculate the similarity information among all the regions in the image, a similarity measurement module based on triple loss is adopted at the rear end of the Network to distinguish the detail information among samples, the characteristic information of the small targets and the similarity relation among the targets are fully extracted, and the accuracy of the multi-class small target detection is improved.

Description

Multi-class small target detection method based on metric learning
Technical Field
The invention relates to a multi-class small target detection method based on metric learning, and belongs to the technical field of image processing.
Background
Target detection is the key research topic in the field of Computer vision, and currently, a target detection technology is widely applied to the fields of automatic industrial inspection, Medical imaging diagnosis, remote sensing Image analysis and the like (Khosravan N, Bagci U.S. 4ND: Single-shot Single-scale non-along detection [ C ]// International conference Medical Image Computing and Computer-Assisted acquisition Intervision (MICCAI). Springer, Cham,2018: 794-. The multi-class small object detection refers to a detection task (m.kasartan, z.wojna, j.murawski, et al.augmentation for small object detection [ J ]. arXiv prediction arXiv:1902.07296,2019.) in which an image has more than two different object classes, and the absolute size of each object is less than 32 × 32 pixels or the relative size is less than 0.1 times of that of the original image, and has wide application prospects in many fields. Small targets can acquire less characteristic information, and multiple classes can cause larger inter-class similarity and intra-class difference, so that the detection of the multiple classes of small-size targets becomes a research hotspot and difficulty in the field of target detection.
The method combines the characteristic information of the upper layer and the bottom layer by adopting a multi-scale characteristic combination method, enriches the semantic information of the bottom layer and keeps higher resolution ratio at the same time, and can effectively improve the positioning effect and the detection precision of the neural network on small-size targets (Zhengqiume, xylonite, WangfuFeng, traffic scene small target detection [ J/OL ] based on the improved convolutional neural network 1-9). However, only the spatial information of the feature map is considered, and the interrelationship between various targets is ignored, so that it is difficult to accurately classify small targets under the background of multi-class detection. Therefore, the method has important research significance and application value in exploring the technical way capable of realizing accurate positioning and classification of the multi-class small targets.
Disclosure of Invention
Technical problem to be solved
In a complex detection scene of multiple classes of small targets, due to the fact that few visual feature information and large similarity and intra-class difference between the classes enable the existing deep neural network which only considers the characteristics of the small targets to have poor detection effect, target missing detection and class confusion are caused, and therefore the capture of the mutual relation between the targets is as important as the characteristics of the targets. In order to avoid the defects of the prior art, the invention provides a multi-class small target detection method based on metric learning.
Technical scheme
A multi-class small target detection method based on metric learning is characterized by comprising the following steps:
step 1: constructing a multi-class small target data set: shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; establishing a classification criterion of the electronic components according to different types and packaging forms of the electronic components, and carrying out image annotation by adopting Labelme software to obtain an annotation file in an xml format; performing quantity expansion on the PCB image by adopting affine transformation to obtain a PCB image data set; making a PCB image data set and an xml label file into a VOC2007 data set format;
step 2: constructing a graph network module and embedding the graph network module into a ResNet101 network of a Faster RCNN, wherein the ResNet101 network comprises five convolution modules conv _1, conv _2, conv _3, conv _4 and conv _5, and the concrete steps are as follows: designing a similarity calculation function and a graph convolution layer structure, dividing an output characteristic graph of an upper layer convolution layer into grids with N small blocks, wherein each small grid is characterized as an input node of the graph network, namely X represents the output characteristic graph of the upper layer convolution layer, X is divided into N areas with equal size, and then the node X is a nodeiRepresents the ith area, node X, of the N areas of the feature mapjRepresents the jth region in the feature map X, X in the formulajDenotes by XiAny one of the other N-1 regions;
calculating function f (X) by similarityi,Xj) Obtaining an edge characteristic matrix Y containing similarity relation between nodes by transmitting node informationi
Figure BDA0002553875000000021
Where C (x) is the normalization operation, the value is N; g (X)j) To aim at XjPerforming a convolution operation with a convolution kernel of 1 x 1;
node feature XiAnd edge feature YiThe layers are input together to obtain a new node feature Zi
Zi=RELU(WzYi+Xi)
Wherein WzTo help embed the parameter matrix of the matching dimension, the node feature Z at that timeiIncluding node XiSelf characteristic information and node XiRegion and other XjRelevance information of the region;
calculating each region node as above to obtain N new node outputs ZiForming a corresponding characteristic diagram Z, wherein the Z is equal to the input characteristic diagram X in size, and the characteristic diagram contains correlation information among the regions;
embedding three graph network modules into the first three convolution modules conv _1, conv _2 and conv _3 of the ResNet101 network respectively;
and step 3: designing a fast R-CNN structure combined with FPN, and applying a characteristic pyramid network FPN in ResNets 101: extracting output characteristic diagrams of the last residual block of the last four modules of the ResNet101 network, which are represented as C2, C3, C4 and C5; c2, C3, C4 and C5 layers respectively pass through 1 multiplied by 1 convolution kernel, feature maps of high-layer low-resolution strong semantic information are sampled through nearest neighbors to obtain feature maps with the same size as a lower layer, the feature maps are added with feature maps of low-layer high-resolution weak semantic information according to elements, and the feature maps are convolved by 3 multiplied by 3 to obtain P2, P3, P4 and P5 layers respectively; the P6 layer is obtained by taking a P5 layer by 0.5 times of sampling;
next, the RPN generates a series of region candidate frames in five feature layers, i.e., P2, P3, P4, P5, and P6, respectively, through an anchor mechanism thereof, and finally performs connection and fusion on prediction results of each layer; mapping each propusals generated by the RPN to a corresponding feature layer according to the area size of the propusals, and performing next ROIPooling pooling operation of the region of interest; extracting features of each Proposals by ROI Pooling, and outputting a Proposals feature map sample with a fixed size of 7 multiplied by 7;
after each characteristic pattern passes through two full-connection layers, respectively calculating through two tail branches of fast RCNN: classifying the specific categories by using a classification loss function; by means of L1Obtaining the accurate position of each target after the loss finishes the frame regression operation; calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:
Figure BDA0002553875000000031
wherein i represents the subscript of each sample, NclsAnd NregAll are normalized parameters, and lambda is a balance parameter of the weight; l isclsRepresents a classification loss; p is a radical ofiRepresenting the probability that the sample is predicted to be of a certain class,
Figure BDA0002553875000000032
is a label of the tagged real data; l isregRepresents the regression loss of the bounding box, and is defined as SmoothL1(t-t*),SmoothL1The definition of the function is
Figure BDA0002553875000000041
Figure BDA0002553875000000042
Figure BDA0002553875000000043
When the representative sample is a positive sample, i.e.
Figure BDA0002553875000000044
Is activated; t is ti={tx、ty、tw、thDenotes the pan scaling parameter of the propofol prediction box,
Figure BDA0002553875000000045
a translation scaling parameter representing real data corresponding to the Proposal;
and 4, step 4: constructing a similarity measurement module based on triple loss, and replacing a classification branch at the tail end of a Faster RCNN network; and selecting a triplet (a, p, n) by adopting a semi-difficult mining strategy, wherein a is a target anchor frame anchor, p is a sample positive similar to a, n is a sample negative different from a, and the triplet loss function is L ═ max (d (a, p) -d (a, n) + margin,0)
Selecting samples to satisfy d (a, p) < d (a, n) < d (a, p) + margin by taking the area of the semi-difficult case as a center;
designing a convolutional neural network on the basis, inputting the selected triples into three convolutional neural networks with the same structure and shared weights, and enabling the networks to learn discriminant characteristics which are enough to distinguish detail information among the classes through triplet loss and model training to obtain a similarity measurement module; the module is embedded into the rear end of a Faster R-CNN model to replace an original normalized index function classification structure, and label classification of the region of interest is carried out to obtain the classification of each target;
and 5: performing end-to-end training on the deep neural network obtained in the steps 2-4 on a training set and a verification set of a PCB data set, and performing forward propagation and backward propagation steps on each picture of an input neural network based on a loss function L ({ p } pi},{ti}) updating the internal parameters of the model to obtain a multi-class small target detection model for detecting the electronic elements on the PCB image;
step 6: and inputting the test set of the PCB data set into the trained deep neural network model, and detecting the electronic element target of the PCB image.
The quantity expansion in step 1 includes random clipping, rotation and flipping.
And N is 1024.
Advantageous effects
The invention provides a multi-class small target detection method based on metric learning, aiming at the recognition characteristics of the multi-class small targets, the invention combines the feature expression capability of deep learning and the similarity discrimination capability of metric learning, and designs a novel deep neural network structure. The method is characterized in that a fast RCNN network structure combined with a characteristic pyramid network (FPN) is adopted to detect multi-class small targets based on the whole image data, a graph network module is embedded in the network to transmit and calculate the similarity information among all the regions in the image, a similarity measurement module based on triple loss is adopted at the rear end of the network to distinguish the detail information among samples, the characteristic information of the small targets and the similarity relation among the targets are fully extracted, and the accuracy of the multi-class small target detection is improved.
The method has the advantages that:
(1) through the second step of the invention, the relevance relation among all the regions of the image is calculated by adopting the image network module in a cross-region mode, the sensitivity of the network to the position of the target is enhanced, and the positioning performance of the target is improved.
(2) According to the third step of the invention, the FPN and the Faster RCNN are combined, and the multi-scale feature fusion can avoid the loss of the detail information of the small target, so that the characterization capability of the small target features is enhanced.
(3) According to the fourth step of the invention, the similarity measurement module based on triple loss is adopted to classify the ROI labels of various targets, and a Softmax layer with characteristic separability but insufficient discriminability is replaced, so that the network learns discriminant characteristics enough for distinguishing detailed information among the categories, and the classification accuracy of small targets is improved.
Drawings
FIG. 1 is a data set construction flow diagram
FIG. 2 is an algorithm flow chart
FIG. 3 is a diagram of a deep neural network architecture
FIG. 4 is a graph of test results
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
the invention aims to provide a multi-class small target detection method based on metric learning, which is realized by the following technical scheme and comprises the following specific steps:
step one, constructing a multi-class small target data set. Taking an electronic element on a Printed Circuit Board (PCB) as a research object, establishing a PCB data set, and the specific process comprises the following steps: shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; secondly, establishing a classification criterion of the electronic components (namely category labels corresponding to the electronic components of different types and packaging forms) according to the types and the packaging forms of the electronic components, and carrying out image annotation by adopting Labelme software to obtain an annotation file in an xml format; secondly, performing quantity expansion on the PCB image by adopting affine transformation, wherein the quantity expansion comprises random cutting, rotation (90 degrees, 180 degrees and 270 degrees) and overturning (horizontal and vertical), and obtaining a PCB image data set; and finally, making the PCB image data set and the xml label file into a VOC2007 data set format.
And step two, constructing a graph network module and embedding the graph network module into a ResNet101 network of the fast RCNN. The backbone network used by the fast RCNN in the present invention is ResNet101, which is used to extract the features of PCB images. The design process of the graph network module is as follows: firstly, designing a similarity calculation function and a graph convolution layer structure, dividing an output characteristic graph of an upper layer convolution layer into grids with N small blocks, wherein each small grid is characterized as an input node of the graph network, namely X represents the output characteristic graph of the upper layer convolution layer, and X is divided into N areas (each area is equal in size), and then the node X isiRepresents the ith area, node X, of the N areas of the feature mapjRepresents the jth region in the feature map X, X in the formulajDenotes by XiAny one of the other N-1 regions;
calculating function f (X) by similarityi,Xj) Obtaining an edge characteristic matrix Y containing similarity relation between nodes by transmitting node informationi
Figure BDA0002553875000000061
Where C (x) is the normalization operation and the value is N.
Node feature XiAnd edge feature YiThe layers are input together to obtain a new node feature Zi
Zi=RELU(WzYi+Xi)
Wherein WzTo help embed the parameter matrix of the matching dimension, the node feature Z at that timeiIncluding node XiSelf characteristic information and node XiRegion and other XjRelevance information of the region.
Calculating each region node as above to obtain N new node outputs ZiA corresponding profile Z (of the same size as the input profile X) is formed, which contains information about the correlation between the regions.
The graph network module does not change the resolution of the feature map, so as shown in fig. 3, three graph network modules are respectively embedded among the first three convolution modules (conv _1, conv _2 and conv _3) of the ResNet101 network, so that the sensitivity of the network to the position of the target is enhanced, and the target positioning performance is improved.
And step three, designing a fast R-CNN structure combined with the FPN, applying the characteristic pyramid network FPN to ResNets101, and performing parameter optimization aiming at target characteristics, thereby improving the detection efficiency of small targets. The concrete structure (as shown in fig. 3) is as follows: the output characteristic diagrams of the last residual block of the last four modules of the ResNet101 network are respectively extracted and are represented as C2, C3, C4 and C5. The C2, C3, C4 and C5 layers respectively pass through 1 multiplied by 1 convolution kernel, feature maps of high-resolution strong semantic information at a high layer are subjected to nearest neighbor upsampling to obtain feature maps with the same size as a lower layer, the feature maps are added with feature maps of high-resolution weak semantic information at a low layer according to elements, and the feature maps are subjected to 3 multiplied by 3 convolution to obtain P2, P3, P4 and P5 layers respectively. The P6 layer is obtained by taking a 0.5-fold sample of the P5 layer.
Next, a series of Region candidate frames (Region candidates) are generated by the Region candidate Network (RPN) in five feature layers of P2, P3, P4, P5, and P6 through an Anchor (Anchor) mechanism, and finally, prediction results of each layer are connected and fused. And respectively mapping each Proposals generated by the RPN to a corresponding feature layer according to the area size of the Proposals, and carrying out the next region of interest Pooling (ROI Pooling) operation. ROI Pooling extracted features for each of the propofol profiles, outputting 7 × 7 samples of the propofol profile at a fixed size.
After each characteristic pattern passes through two full-connection layers, respectively calculating through two tail branches of fast RCNN: classifying the specific categories by using a classification loss function; by means of L1And obtaining the accurate position of each target after the loss finishes the frame regression operation. Calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:
Figure BDA0002553875000000081
wherein i represents the subscript of each sample, NclsAnd NregAre all normalized parameters, and lambda is a balance parameter of the weight. L isclsIndicating a classification loss. p is a radical ofiRepresenting the probability that the sample is predicted to be of a certain class,
Figure BDA0002553875000000082
is a label of the marked real data. L isregRepresents the regression loss of the bounding box, and is defined as SmoothL1(t-t*),SmoothL1The definition of the function is
Figure BDA0002553875000000083
Figure BDA0002553875000000084
Figure BDA0002553875000000085
When the representative sample is a positive sample, i.e.
Figure BDA0002553875000000086
Is activated. t is ti={tx、ty、tw、thDenotes the pan scaling parameter of the propofol prediction box,
Figure BDA0002553875000000087
a pan scaling parameter representing the real data to which the propofol corresponds.
And step four, constructing a similarity measurement module based on triple loss, and replacing a classification branch at the tail end of the Faster RCNN network. And selecting a triplet (a, p, n) by adopting a semi-difficult mining strategy, wherein a is a target anchor frame anchor, p is a sample positive similar to a, n is a sample negative different from a, and the triplet loss function is L ═ max (d (a, p) -d (a, n) + margin,0)
With the area of the semi-difficult case as the center, the sample selection satisfies d (a, p) < d (a, n) < d (a, p) + margin.
And designing a convolutional neural network on the basis, respectively inputting the selected triplets into three convolutional neural networks with the same structure and shared weights, and enabling the networks to learn discriminant characteristics which are sufficient for distinguishing detailed information among the classes through triple Loss and model training to obtain a similarity measurement module. The module is embedded into the rear end of a Faster R-CNN model, replaces an original normalized exponential function (Softmax) classification structure, and performs label classification of a Region of interest (ROI) to obtain the classification of each target, so that the classification precision of small targets is improved.
And step five, finishing the overall design of the deep neural network based on the three steps, training the model and optimizing parameters by adopting the multi-class small target data set, and finally performing model test.
Referring to fig. 2, a basic flow of a multi-class small-object detection method based on metric learning of the present invention, electronic components on a Printed Circuit Board (PCB) have characteristics of diversified types and packaging forms, small visual area, and the like, and therefore, the detection of the electronic components on the PCB is taken as an example to illustrate a specific embodiment of the present invention, but the technical content of the present invention is not limited to the range, and the specific embodiment includes the following steps:
step one, constructing a multi-class small target data set. In this embodiment, electronic components on a PCB are used as research objects to establish a PCB data set, and the specific process (see fig. 1) is as follows:
shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; secondly, establishing a classification criterion of the electronic components (namely category labels corresponding to the electronic components of different types and packaging forms) according to the different types and packaging forms of the electronic components, carrying out image annotation by adopting Labelme software, annotating the positions of the electronic components and the corresponding category labels of each PCB image, obtaining an annotation file (json format) corresponding to each image, and converting the annotation file into an xml file format; secondly, performing quantity expansion on the PCB image by adopting affine transformation, wherein the quantity expansion comprises random cutting, rotation (90 degrees, 180 degrees and 270 degrees) and overturning (horizontal and vertical), and obtaining a PCB image data set; and finally, making the PCB image data set and the xml label file into a VOC2007 data set format and generating txt files of train, val and test.
Step two, building a deep neural network, training the deep neural network model by adopting a training set and a verification set of a PCB data set to obtain a multi-class small target detection model taking electronic elements on the PCB as objects, and describing the specific process by taking the input PCB image 1024 multiplied by 1024 as an example:
(1) a backbone network used by the FasterRCNN in the invention is ResNet101 which is used for extracting the characteristics of a PCB image, the ResNet101 network comprises five convolution modules (conv _1, conv _2, conv _3, conv _4 and conv _5), as shown in FIG. 3, the input PCB image 1024 × 1024 of the invention is taken as an example, the size of the characteristic image is 256 × 256 after passing conv _1, the characteristic image is taken as the input of a graph network module, the design process of the graph network module is shown, firstly, a similarity calculation function and a graph volume layer structure are designed, the output characteristic image of an upper layer volume layer is divided into grid with N small blocksEach small grid is characterized as an input node of the graph network, that is, X represents a 256 × 256 characteristic graph of conv _1 convolutional layer output, and X is divided into N-32 × 32-1024 regions (each region has a size of 8 × 8), so that the node X is represented as a node XiThe i-th 8 × 8 size region, node X, of the 1024 regions representing the feature mapjRepresents the jth region in the feature map X, X in the formulajDenotes by XiAny 8 × 8 size area out of the remaining 1023 areas;
calculating function f (X) by similarityi,Xj) Obtaining an edge characteristic matrix Y containing similarity relation between nodes by transmitting node informationi
Figure BDA0002553875000000101
Where c (x) is the normalization operation, the value N is 1024.
Node feature XiAnd edge feature YiThe layers are input together to obtain a new node feature Zi
Zi=RELU(WzYi+Xi)
Wherein WzTo help embed the parameter matrix of the matching dimension, the node feature Z at that timeiIncluding node XiSelf characteristic information and node XiRegion and other XjRelevance information of the region.
The calculation is carried out on each region node (1024 total) to finally obtain the output Z of 1024 new nodesi(and XiCorresponding regions of equal size, 8 × 8 size) of the input feature map X (256 × 256 size, equal size to input feature map X), which contains information about the correlation between the regions.
The graph network module does not change the resolution of the feature map, so as shown in fig. 3, three graph network modules are respectively embedded among the first three convolution modules (conv _1, conv _2 and conv _3) of the ResNet101 network, so that the sensitivity of the network to the position of the target is enhanced, and the target positioning performance is improved.
(2) A fast R-CNN structure combined with FPN is designed, and a characteristic pyramid network FPN is applied to ResNets 101. The concrete structure (as shown in fig. 3) is as follows: the ResNet101 network includes five convolution modules (conv _1, conv _2, conv _3, conv _4, conv _5), and output feature maps of the last residual block of the last four modules are extracted and shown as C2, C3, C4, and C5. Taking the input PCB image 1024 × 1024 of the present invention as an example, the sizes of the characteristic diagrams from C2 to C5 are in turn: 256 × 256 × 256, 128 × 128 × 512, 64 × 64 × 1024, and 32 × 32 × 2048, the size of the C2 layer is reduced by 4 times, the C3 layer is reduced by 8 times, the C4 layer is reduced by 16 times, and the C5 layer is reduced by 32 times, respectively, as compared with the original image. C2, C3, C4 and C5 were subjected to 1 × 1 convolution kernels, respectively, to make the number of channels in each layer uniform to 256, but the size of the feature map was not changed. And (3) performing 2 times of nearest neighbor upsampling on the feature map of the high-resolution strong semantic information to obtain a feature map with the same size as the lower layer, and adding the feature map of the high-resolution weak semantic information of the lower layer by elements to respectively obtain P2 layers, P3 layers and P4 layers. And (3) weakening the aliasing effect of upsampling by passing each joint feature map (namely P2, P3 and P4) through a 3 x 3 convolution kernel to obtain final P2, P3 and P4 layers. The P5 layer is obtained directly without upsampling and 3 x 3 convolution operations. The P6 layers are obtained by 0.5-fold down-sampling the P5 layers, and have a size of 16 × 16 × 256.
Next, RPN generates a series of propofol by its Anchor mechanism in five feature layers of different sizes P2, P3, P4, P5, and P6, and each layer makes independent target candidate frame prediction. And finally, performing connection fusion on the prediction result of each layer. The RPN network structure is a 3 × 3 convolution layer and two convolution output branches: the probability that the candidate region is the target is output by the left branch; and the right branch outputs the coordinates of the upper left corner and the width and the height of a candidate area frame (bounding box). In the RPN training process, the cross-over ratio of the label frame is more than 0.7 as a positive label (target), and less than 0.3 as a negative label (background). Under the characteristic pyramid network FPN, the Anchor frame adopts three aspect ratios of 1:1, 2:1 and 1:2, and the side lengths of the Anchor frames of 5 prediction layers are respectively 32, 64, 128, 256 and 512 according to the size of an electronic element, so that a total of 15 anchors in different shapes are adopted.
Each of the Propusals boxes generated by the RPN is mapped to the corresponding one of the areas (w × h) according to the size of the areaCorresponding characteristic layer PkThe next ROI Pooling procedure was performed. The k value is calculated as follows, where k04, w and h are the width and height of the bounding box:
Figure BDA0002553875000000111
(k Final value range: 2, 3, 4, 5)
The ROI Pooling extracts the characteristics of each Propusals, Proposals characteristic diagram samples with the fixed size of 7 × 7 are output, consistency of the sizes of the characteristics entering a full connection layer is guaranteed, after each characteristic diagram sample passes through two 1024d full connection layers, the characteristic diagram samples are respectively calculated through two tail branches of fast RCNN, classification loss functions are used for classification of specific categories to obtain the categories of electronic elements, L is used for classifying the categories of the electronic elements, and the characteristics of the electronic elements are obtained through the classification loss functions1And obtaining the accurate position of each target after the loss finishes the frame regression operation. Calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:
Figure BDA0002553875000000112
wherein i represents the subscript of each sample, NclsAnd NregAre all normalized parameters, and lambda is a balance parameter of the weight. L isclsIndicating a classification loss. p is a radical ofiRepresenting the probability that the sample is predicted to be of a certain class,
Figure BDA0002553875000000121
is a label of the marked real data. L isregRepresents the regression loss of the bounding box, and is defined as SmoothL1(t-t*),SmoothL1The definition of the function is
Figure BDA0002553875000000122
Figure BDA0002553875000000123
Figure BDA0002553875000000124
When the representative sample is a positive sample, i.e.
Figure BDA0002553875000000125
Is activated. t is ti={tx、ty、tw、thDenotes the pan scaling parameter of the propofol prediction box,
Figure BDA0002553875000000126
a pan scaling parameter representing the real data to which the propofol corresponds.
(3) A similarity metric module based on triplet loss is constructed and replaces the classification branch at the end of the fast RCNN network, as shown in fig. 3. The selection strategy of the triples is crucial, the random selection of the positive and negative examples of the samples easily causes slow model convergence, and the mining of only the difficult examples easily causes model collapse. Therefore, a semi-difficult case mining strategy is adopted to select the triples (a, p, n), wherein a is a target Anchor frame Anchor, p is a sample Positive similar to a, and n is a sample Negative different from a, and then the triple loss function is as follows:
L=max(d(a,p)-d(a,n)+margin,0)
and taking the area of the semi-difficult case as the center, selecting a non-homogeneous target with larger similarity to the sample as a negative sample pair, and selecting a homogeneous target with the minimum similarity to the sample as a positive sample pair, namely, satisfying d (a, p) < d (a, n) < d (a, p) + margin.
On the basis, a triple convolutional neural network is designed, and selected triples are respectively input into three convolutional neural networks with the same structure and shared weight, wherein the shared network consists of an input layer, a convolutional layer with two cores of 3 x 3, a maximum pooling layer and a full-connection layer. And (3) learning the discriminative characteristics which are enough for distinguishing the detail information among the classes by the network through triple Loss and model training, and obtaining the similarity measurement module. The original Softmax layer classification structure is replaced, the module is embedded into a classification branch at the tail end of a Faster R-CNN model, and classification of each target is obtained, so that the classification precision of small targets is improved.
(4) The three steps are performed on a training set and a verification set of a PCB databaseThe obtained deep neural network is trained end to end, and forward propagation and backward propagation steps are executed for each picture of the input neural network based on a loss function L ({ pi},{tiAnd) updating the internal parameters of the model to obtain a multi-class small target detection model for detecting the electronic elements on the PCB image.
Step three, inputting a test set of a PCB data set as a test example into the trained deep neural network model, and detecting an electronic element target of the PCB image, wherein the specific process is as follows:
(1) inputting a group of PCB images to be tested, limiting the maximum side length of the input image to be 1024, and obtaining 400 candidate target regions Proposals in the image through RPN after feature extraction of ResNet network and FPN network;
(2) the ROI Pooling takes the original image feature map and each candidate target area as input, extracts the feature maps of the candidate target areas and outputs 7 multiplied by 7 feature maps with uniform sizes for next detection frame regression and target classification;
(3) the similarity measurement module based on triple loss takes the Proposal characteristics as input, extracts discriminant characteristics through a shared network and outputs corresponding target categories; and obtaining accurate rectangular position information of each target detection frame through regression of the characteristic information of the Proposal through the full connection layer and the frame. Finally, marking out all circumscribed rectangles marked as electronic element targets and the categories of the circumscribed rectangles in the original image;
(4) and evaluating the result by adopting the average precision AP and the average precision mAP. False Negative (FN): is judged as a negative sample, but is actually a positive sample; false Positive (FP): is judged as a positive sample, but is actually a negative sample; true Negative (tube Negative, TN): is determined to be a negative sample, and is in fact a negative sample; true case (TurePositve, TP): is determined to be a positive sample, and is actually a positive sample. The Precision (Precision) is TP/(TP + FP), the Recall (Recall) is TP/(TP + FN), and a two-dimensional curve with Precision and Recall as vertical and horizontal axis coordinates is the Precision-Recall (P-R) curve. The average precision AP of each category is the area enclosed by the P-R curves corresponding to the category, and the average precision average mAP is the average value of the AP values of each category.

Claims (3)

1. A multi-class small target detection method based on metric learning is characterized by comprising the following steps:
step 1: constructing a multi-class small target data set: shooting and collecting various PCBs of different models by using an industrial camera, and storing the PCBs in a JPEG format; establishing a classification criterion of the electronic components according to different types and packaging forms of the electronic components, and carrying out image annotation by adopting Labelme software to obtain an annotation file in an xml format; performing quantity expansion on the PCB image by adopting affine transformation to obtain a PCB image data set; making a PCB image data set and an xml label file into a VOC2007 data set format;
step 2: constructing a graph network module and embedding the graph network module into a ResNet101 network of a Faster RCNN, wherein the ResNet101 network comprises five convolution modules conv _1, conv _2, conv _3, conv _4 and conv _5, and the concrete steps are as follows: designing a similarity calculation function and a graph convolution layer structure, dividing an output characteristic graph of an upper layer convolution layer into grids with N small blocks, wherein each small grid is characterized as an input node of the graph network, namely X represents the output characteristic graph of the upper layer convolution layer, X is divided into N areas with equal size, and then the node X is a nodeiRepresents the ith area, node X, of the N areas of the feature mapjRepresents the jth region in the feature map X, X in the formulajDenotes by XiAny one of the other N-1 regions;
calculating function f (X) by similarityi,Xj) Obtaining an edge characteristic matrix Y containing similarity relation between nodes by transmitting node informationi
Figure FDA0002553874990000011
Where C (x) is the normalization operation, the value is N; g (X)j) To aim at XjPerforming a convolution operation with a convolution kernel of 1 x 1;
node feature XiAnd edge feature YiInputting the graph convolution layers together, fromTo obtain a new node characteristic Zi
Zi=RELU(WzYi+Xi)
Wherein WzTo help embed the parameter matrix of the matching dimension, the node feature Z at that timeiIncluding node XiSelf characteristic information and node XiRegion and other XjRelevance information of the region;
calculating each region node as above to obtain N new node outputs ZiForming a corresponding characteristic diagram Z, wherein the Z is equal to the input characteristic diagram X in size, and the characteristic diagram contains correlation information among the regions;
embedding three graph network modules into the first three convolution modules conv _1, conv _2 and conv _3 of the ResNet101 network respectively;
and step 3: designing a fast R-CNN structure combined with FPN, and applying a characteristic pyramid network FPN in ResNets 101: extracting output characteristic diagrams of the last residual block of the last four modules of the ResNet101 network, which are represented as C2, C3, C4 and C5; c2, C3, C4 and C5 layers respectively pass through 1 multiplied by 1 convolution kernel, feature maps of high-layer low-resolution strong semantic information are sampled through nearest neighbors to obtain feature maps with the same size as a lower layer, the feature maps are added with feature maps of low-layer high-resolution weak semantic information according to elements, and the feature maps are convolved by 3 multiplied by 3 to obtain P2, P3, P4 and P5 layers respectively; the P6 layer is obtained by taking a P5 layer by 0.5 times of sampling;
next, the RPN generates a series of region candidate frames in five feature layers, i.e., P2, P3, P4, P5, and P6, respectively, through an anchor mechanism thereof, and finally performs connection and fusion on prediction results of each layer; mapping each Proposals generated by the RPN to a corresponding feature layer according to the area size of the Proposals, and performing the next ROI Pooling Pooling operation; extracting features of each Proposals by ROI Pooling, and outputting a Proposals feature map sample with a fixed size of 7 multiplied by 7;
after each characteristic pattern passes through two full-connection layers, respectively calculating through two tail branches of fast RCNN: classifying the specific categories by using a classification loss function; by means of L1Obtaining the accurate position of each target after the loss finishes the frame regression operation; calculating a loss function L, and updating parameters of the whole network to obtain a training model, wherein the training loss comprises classification loss and regression loss, and the calculation formula is as follows:
Figure FDA0002553874990000021
wherein i represents the subscript of each sample, NclsAnd NregAll are normalized parameters, and lambda is a balance parameter of the weight; l isclsRepresents a classification loss; p is a radical ofiRepresenting the probability that the sample is predicted to be of a certain class,
Figure FDA0002553874990000022
is a label of the tagged real data; l isregRepresents the regression loss of the bounding box, and is defined as SmoothL1(t-t*),SmoothL1The definition of the function is
Figure FDA0002553874990000023
Figure FDA0002553874990000024
Figure FDA0002553874990000025
When the representative sample is a positive sample, i.e.
Figure FDA0002553874990000026
Is activated; t is ti={tx、ty、tw、thDenotes the pan scaling parameter of the propofol prediction box,
Figure FDA0002553874990000031
a translation scaling parameter representing real data corresponding to the Proposal;
and 4, step 4: constructing a similarity measurement module based on triple loss, and replacing a classification branch at the tail end of a Faster RCNN network; selecting a triplet (a, p, n) by adopting a semi-difficult case mining strategy, wherein a is a target anchor frame anchor, p is a sample positive similar to a, n is a sample negative different from a, and a triplet loss function is
L=max(d(a,p)-d(a,n)+margin,0)
Selecting samples to satisfy d (a, p) < d (a, n) < d (a, p) + margin by taking the area of the semi-difficult case as a center;
designing a convolutional neural network on the basis, inputting the selected triples into three convolutional neural networks with the same structure and shared weights, and enabling the networks to learn discriminant characteristics which are enough to distinguish detail information among the classes through triplet loss and model training to obtain a similarity measurement module; the module is embedded into the rear end of a Faster R-CNN model to replace an original normalized index function classification structure, and label classification of the region of interest is carried out to obtain the classification of each target;
and 5: performing end-to-end training on the deep neural network obtained in the steps 2-4 on a training set and a verification set of a PCB data set, and performing forward propagation and backward propagation steps on each picture of an input neural network based on a loss function L ({ p } pi},{ti}) updating the internal parameters of the model to obtain a multi-class small target detection model for detecting the electronic elements on the PCB image;
step 6: and inputting the test set of the PCB data set into the trained deep neural network model, and detecting the electronic element target of the PCB image.
2. The method of claim 1, wherein the quantity expansion in step 1 comprises random cropping, rotation and flipping.
3. The method of claim 1, wherein N is 1024.
CN202010583655.1A 2020-06-24 2020-06-24 Multi-class small target detection method based on metric learning Pending CN111783590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010583655.1A CN111783590A (en) 2020-06-24 2020-06-24 Multi-class small target detection method based on metric learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010583655.1A CN111783590A (en) 2020-06-24 2020-06-24 Multi-class small target detection method based on metric learning

Publications (1)

Publication Number Publication Date
CN111783590A true CN111783590A (en) 2020-10-16

Family

ID=72757220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010583655.1A Pending CN111783590A (en) 2020-06-24 2020-06-24 Multi-class small target detection method based on metric learning

Country Status (1)

Country Link
CN (1) CN111783590A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112180903A (en) * 2020-10-19 2021-01-05 江苏中讯通物联网技术有限公司 Vehicle state real-time detection system based on edge calculation
CN112364754A (en) * 2020-11-09 2021-02-12 云南电网有限责任公司迪庆供电局 Bolt defect detection method and system
CN112364778A (en) * 2020-11-12 2021-02-12 上海明华电力科技有限公司 Power plant safety behavior information automatic detection method based on deep learning
CN112418278A (en) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 Multi-class object detection method, terminal device and storage medium
CN112560853A (en) * 2020-12-14 2021-03-26 中科云谷科技有限公司 Image processing method, device and storage medium
CN112669264A (en) * 2020-12-17 2021-04-16 国网山西省电力公司运城供电公司 Artificial intelligence defect identification method and system for unmanned aerial vehicle routing inspection of distribution network line
CN112800257A (en) * 2021-02-10 2021-05-14 上海零眸智能科技有限公司 Method for quickly adding sample training data based on image searching
CN112800955A (en) * 2021-01-27 2021-05-14 中国人民解放军战略支援部队信息工程大学 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN112801058A (en) * 2021-04-06 2021-05-14 艾伯资讯(深圳)有限公司 UML picture identification method and system
CN112836719A (en) * 2020-12-11 2021-05-25 南京富岛信息工程有限公司 Indicator diagram similarity detection method fusing two classifications and three groups
CN112949520A (en) * 2021-03-10 2021-06-11 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN113283513A (en) * 2021-05-31 2021-08-20 西安电子科技大学 Small sample target detection method and system based on target interchange and metric learning
CN113313082A (en) * 2021-07-28 2021-08-27 北京电信易通信息技术股份有限公司 Target detection method and system based on multitask loss function
CN113361437A (en) * 2021-06-16 2021-09-07 吉林建筑大学 Method and system for detecting category and position of minimally invasive surgical instrument
CN113420648A (en) * 2021-06-22 2021-09-21 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN113435389A (en) * 2021-07-09 2021-09-24 大连海洋大学 Chlorella and chrysophyceae classification and identification method based on image feature deep learning
CN113487551A (en) * 2021-06-30 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving performance of dense target based on deep learning
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN115100419A (en) * 2022-07-20 2022-09-23 中国科学院自动化研究所 Target detection method and device, electronic equipment and storage medium
CN115965915A (en) * 2022-11-01 2023-04-14 哈尔滨市科佳通用机电股份有限公司 Wagon connecting pull rod fracture fault identification method and system based on deep learning
CN115984846A (en) * 2023-02-06 2023-04-18 山东省人工智能研究院 Intelligent identification method for small target in high-resolution image based on deep learning
CN113469272B (en) * 2021-07-20 2023-05-19 东北财经大学 Target detection method for hotel scene picture based on fast R-CNN-FFS model
CN112364754B (en) * 2020-11-09 2024-05-14 云南电网有限责任公司迪庆供电局 Bolt defect detection method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108122003A (en) * 2017-12-19 2018-06-05 西北工业大学 A kind of Weak target recognition methods based on deep neural network
CN109711474A (en) * 2018-12-24 2019-05-03 中山大学 A kind of aluminium material surface defects detection algorithm based on deep learning
CN110070536A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of pcb board component detection method based on deep learning
CN110287998A (en) * 2019-05-28 2019-09-27 浙江工业大学 A kind of scientific and technical literature picture extracting method based on Faster-RCNN
CN110321815A (en) * 2019-06-18 2019-10-11 中国计量大学 A kind of crack on road recognition methods based on deep learning
CN110348437A (en) * 2019-06-27 2019-10-18 电子科技大学 It is a kind of based on Weakly supervised study with block the object detection method of perception
CN110910421A (en) * 2019-11-11 2020-03-24 西北工业大学 Weak and small moving object detection method based on block characterization and variable neighborhood clustering
CN111242026A (en) * 2020-01-13 2020-06-05 中国矿业大学 Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111274972A (en) * 2020-01-21 2020-06-12 北京妙医佳健康科技集团有限公司 Dish identification method and device based on metric learning
CN111310756A (en) * 2020-01-20 2020-06-19 陕西师范大学 Damaged corn particle detection and classification method based on deep learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108122003A (en) * 2017-12-19 2018-06-05 西北工业大学 A kind of Weak target recognition methods based on deep neural network
CN109711474A (en) * 2018-12-24 2019-05-03 中山大学 A kind of aluminium material surface defects detection algorithm based on deep learning
CN110070536A (en) * 2019-04-24 2019-07-30 南京邮电大学 A kind of pcb board component detection method based on deep learning
CN110287998A (en) * 2019-05-28 2019-09-27 浙江工业大学 A kind of scientific and technical literature picture extracting method based on Faster-RCNN
CN110321815A (en) * 2019-06-18 2019-10-11 中国计量大学 A kind of crack on road recognition methods based on deep learning
CN110348437A (en) * 2019-06-27 2019-10-18 电子科技大学 It is a kind of based on Weakly supervised study with block the object detection method of perception
CN110910421A (en) * 2019-11-11 2020-03-24 西北工业大学 Weak and small moving object detection method based on block characterization and variable neighborhood clustering
CN111242026A (en) * 2020-01-13 2020-06-05 中国矿业大学 Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN111310756A (en) * 2020-01-20 2020-06-19 陕西师范大学 Damaged corn particle detection and classification method based on deep learning
CN111274972A (en) * 2020-01-21 2020-06-12 北京妙医佳健康科技集团有限公司 Dish identification method and device based on metric learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BING HU 等: "Detection of PCB Surface Defects with Improved Faster-RCNN and Feature Pyramid Network", 《DIGITAL OBJECT IDENTIFIER》 *
BING HU 等: "Detection of PCB Surface Defects with Improved Faster-RCNN and Feature Pyramid Network", 《DIGITAL OBJECT IDENTIFIER》, 31 December 2017 (2017-12-31) *
CHIA-WEN KUO等: "Data-Efficient Graph Embedding Learning for PCB Component Detection", 《ARXIV:1811.06994V2》 *
CHIA-WEN KUO等: "Data-Efficient Graph Embedding Learning for PCB Component Detection", 《ARXIV:1811.06994V2》, 20 November 2018 (2018-11-20), pages 1 *
XIAOLONG WANG 等: "Non-local Neural Networks", 《CVPR》, 31 December 2018 (2018-12-31), pages 7794 - 7803 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112180903A (en) * 2020-10-19 2021-01-05 江苏中讯通物联网技术有限公司 Vehicle state real-time detection system based on edge calculation
CN112418278A (en) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 Multi-class object detection method, terminal device and storage medium
CN112364754B (en) * 2020-11-09 2024-05-14 云南电网有限责任公司迪庆供电局 Bolt defect detection method and system
CN112364754A (en) * 2020-11-09 2021-02-12 云南电网有限责任公司迪庆供电局 Bolt defect detection method and system
CN112364778A (en) * 2020-11-12 2021-02-12 上海明华电力科技有限公司 Power plant safety behavior information automatic detection method based on deep learning
CN112836719B (en) * 2020-12-11 2024-01-05 南京富岛信息工程有限公司 Indicator diagram similarity detection method integrating two classifications and triplets
CN112836719A (en) * 2020-12-11 2021-05-25 南京富岛信息工程有限公司 Indicator diagram similarity detection method fusing two classifications and three groups
CN112560853A (en) * 2020-12-14 2021-03-26 中科云谷科技有限公司 Image processing method, device and storage medium
CN112669264A (en) * 2020-12-17 2021-04-16 国网山西省电力公司运城供电公司 Artificial intelligence defect identification method and system for unmanned aerial vehicle routing inspection of distribution network line
CN112800955A (en) * 2021-01-27 2021-05-14 中国人民解放军战略支援部队信息工程大学 Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN112800257A (en) * 2021-02-10 2021-05-14 上海零眸智能科技有限公司 Method for quickly adding sample training data based on image searching
CN112949520A (en) * 2021-03-10 2021-06-11 华东师范大学 Aerial photography vehicle detection method and detection system based on multi-scale small samples
CN112801058A (en) * 2021-04-06 2021-05-14 艾伯资讯(深圳)有限公司 UML picture identification method and system
CN112801058B (en) * 2021-04-06 2021-06-29 艾伯资讯(深圳)有限公司 UML picture identification method and system
CN113283513A (en) * 2021-05-31 2021-08-20 西安电子科技大学 Small sample target detection method and system based on target interchange and metric learning
CN113283513B (en) * 2021-05-31 2022-12-13 西安电子科技大学 Small sample target detection method and system based on target interchange and metric learning
CN113361437A (en) * 2021-06-16 2021-09-07 吉林建筑大学 Method and system for detecting category and position of minimally invasive surgical instrument
CN113420648A (en) * 2021-06-22 2021-09-21 深圳市华汉伟业科技有限公司 Target detection method and system with rotation adaptability
CN113487551A (en) * 2021-06-30 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving performance of dense target based on deep learning
CN113487551B (en) * 2021-06-30 2024-01-16 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving dense target performance based on deep learning
CN113435389A (en) * 2021-07-09 2021-09-24 大连海洋大学 Chlorella and chrysophyceae classification and identification method based on image feature deep learning
CN113435389B (en) * 2021-07-09 2024-03-01 大连海洋大学 Chlorella and golden algae classification and identification method based on image feature deep learning
CN113469272B (en) * 2021-07-20 2023-05-19 东北财经大学 Target detection method for hotel scene picture based on fast R-CNN-FFS model
CN113657174A (en) * 2021-07-21 2021-11-16 北京中科慧眼科技有限公司 Vehicle pseudo-3D information detection method and device and automatic driving system
CN113313082A (en) * 2021-07-28 2021-08-27 北京电信易通信息技术股份有限公司 Target detection method and system based on multitask loss function
CN115100419A (en) * 2022-07-20 2022-09-23 中国科学院自动化研究所 Target detection method and device, electronic equipment and storage medium
CN115965915A (en) * 2022-11-01 2023-04-14 哈尔滨市科佳通用机电股份有限公司 Wagon connecting pull rod fracture fault identification method and system based on deep learning
CN115965915B (en) * 2022-11-01 2023-09-08 哈尔滨市科佳通用机电股份有限公司 Railway wagon connecting pull rod breaking fault identification method and system based on deep learning
CN115984846A (en) * 2023-02-06 2023-04-18 山东省人工智能研究院 Intelligent identification method for small target in high-resolution image based on deep learning
CN115984846B (en) * 2023-02-06 2023-10-10 山东省人工智能研究院 Intelligent recognition method for small targets in high-resolution image based on deep learning

Similar Documents

Publication Publication Date Title
CN111783590A (en) Multi-class small target detection method based on metric learning
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN109285139A (en) A kind of x-ray imaging weld inspection method based on deep learning
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN111368690B (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN108596108B (en) Aerial remote sensing image change detection method based on triple semantic relation learning
CN110246181B (en) Anchor point-based attitude estimation model training method, attitude estimation method and system
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN113313094B (en) Vehicle-mounted image target detection method and system based on convolutional neural network
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
Li et al. A review of deep learning methods for pixel-level crack detection
CN114998566A (en) Interpretable multi-scale infrared small and weak target detection network design method
CN110738132A (en) target detection quality blind evaluation method with discriminant perception capability
CN115620141A (en) Target detection method and device based on weighted deformable convolution
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN116168240A (en) Arbitrary-direction dense ship target detection method based on attention enhancement
CN113361496B (en) City built-up area statistical method based on U-Net
CN115147644A (en) Method, system, device and storage medium for training and describing image description model
CN113361528B (en) Multi-scale target detection method and system
CN113723558A (en) Remote sensing image small sample ship detection method based on attention mechanism
CN114187506A (en) Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network
CN117372898A (en) Unmanned aerial vehicle aerial image target detection method based on improved yolov8
CN116258877A (en) Land utilization scene similarity change detection method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201016

RJ01 Rejection of invention patent application after publication