CN116229142A - Online target detection method and system based on depth feature matching - Google Patents
Online target detection method and system based on depth feature matching Download PDFInfo
- Publication number
- CN116229142A CN116229142A CN202211664409.4A CN202211664409A CN116229142A CN 116229142 A CN116229142 A CN 116229142A CN 202211664409 A CN202211664409 A CN 202211664409A CN 116229142 A CN116229142 A CN 116229142A
- Authority
- CN
- China
- Prior art keywords
- image
- feature map
- regression
- target sample
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 43
- 230000004913 activation Effects 0.000 claims abstract description 99
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000013528 artificial neural network Methods 0.000 claims abstract description 39
- 238000000034 method Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 7
- 230000003213 activating effect Effects 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 abstract description 15
- 238000012544 monitoring process Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 37
- 239000004973 liquid crystal related substance Substances 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000013135 deep learning Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 108700041286 delta Proteins 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012567 pattern recognition method Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides an online target detection method and system based on depth feature matching, comprising the following steps: acquiring a new class target sample image and an image to be detected; inputting the new class target sample image and the image to be tested into a trained neural network backbone network and an area generating network to obtain a class activation feature map and a regression activation feature map of the image to be tested; obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map; the invention avoids complex data acquisition, data annotation and off-line training of an algorithm model of a new class of target sample image, and obtains the class and position frame of the new class of target sample image in the image to be detected by detecting one or a small number of new class of target sample images through a neural network backbone network and a regional generation network, thereby being particularly suitable for video monitoring, unmanned aerial vehicle earth detection and high-value target discovery of remote sensing images.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an online target detection method and system based on depth feature matching.
Background
Deep learning (deep learning) has been widely used in various fields of society. Target detection is one of the basic problems in the field of computer vision, and the most excellent target detection algorithm at present is a target detection algorithm based on deep learning. The current target detector depends on a large amount of annotation data for training, and large data sets of categories such as faces, pedestrians, vehicles and the like exist at present, but for some new categories and new samples, large data are difficult to construct. The target detection based on the deep learning is the same as most of the deep learning algorithms, a large amount of marked data is required to perform supervised learning training, and when the number of marked data is limited, the accuracy and generalization capability of the algorithm are difficult to ensure. When a sufficient number of samples can be obtained, we can label the data manually or automatically or semi-automatically, thereby obtaining a large amount of labeled data. However, in some application scenarios, it is difficult to obtain many sample data, for example, search a suspicious person from a monitoring video, the suspicious person only has one or several pictures, it is impossible to perform large-scale deep learning training, and it is difficult to obtain an effective deep learning model, and then the application of a target detection algorithm based on deep learning is greatly limited.
Currently, detection is performed on targets with only one sample or few samples, and a template matching method is generally adopted. Template matching is the most basic pattern recognition method, and is the most basic and most commonly used matching method in image processing, wherein the pattern of a specific object is researched and positioned in the image, so that the object is recognized. Template matching has the limitation of itself, mainly represented by that it can only make parallel movement, if the matching target in the original image is rotated or changed in size, the algorithm performance is suddenly reduced, that is, the template matching method generally has no rotation invariance. The current deep learning method is widely applied, and the feature map extracted by the deep neural network has translational invariance and rotational invariance, so that a very good effect is achieved in the field of target detection.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an online target detection method based on depth feature matching, which comprises the following steps:
acquiring a new class target sample image and an image to be detected;
inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested;
obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Preferably, the region generation network includes a region generation network classification branch and a region generation regression classification branch, and the training of the region generation network includes:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
Preferably, the inputting the new class target sample image and the image to be tested into the trained neural network backbone network and the region generating network to obtain a class activation feature map and a regression activation feature map of the image to be tested includes:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
Preferably, the obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map includes:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
Preferably, the activation function is as follows:
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Based on the same inventive concept, the invention also provides an online target detection system based on depth feature matching, which comprises:
the device comprises an image acquisition module, a feature map acquisition module and a target detection module;
the image acquisition module is used for acquiring a new class target sample image and an image to be detected;
the feature map acquisition module is used for inputting the new class target sample image and the image to be tested into a neural network backbone network and a region generating network after training is completed, and obtaining a classified activation feature map and a regression activation feature map of the image to be tested;
the target detection module is used for obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Preferably, the region generating network in the feature map acquiring module includes a region generating network classification branch and a region generating regression classification branch, and the training of the region generating network includes:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
Preferably, the feature map obtaining module is specifically configured to:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
Preferably, the target detection module is specifically configured to:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
Preferably, the activation function of the target detection module is as follows:
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Compared with the closest prior art, the invention has the following beneficial effects:
the invention provides an online target detection method and system based on depth feature matching, comprising the following steps: acquiring a new class target sample image and an image to be detected; inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested; obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map; the new class target sample image is an image obtained from the image to be detected; training the neural network backbone network and the area generating network based on the old class target sample image and the image to be tested and obtaining trained weights; the invention avoids complex data acquisition, data annotation and off-line training of an algorithm model of a new class of target sample image, and obtains the class and position frame of the new class of target sample image in the image to be detected by detecting one or a small number of new class of target sample images through a neural network backbone network and a regional generation network, thereby being particularly suitable for video monitoring, unmanned aerial vehicle earth detection and high-value target discovery of remote sensing images.
Drawings
FIG. 1 is a schematic flow chart of an online target detection method based on depth feature matching;
FIG. 2 is a training flowchart of an online target detection algorithm based on depth feature matching in an embodiment provided by the invention;
FIG. 3 is a flowchart of an online target detection algorithm based on depth feature matching in an embodiment provided by the invention;
FIG. 4 is a schematic diagram of a classification characteristic diagram according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a regression feature map according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an online object detection system based on depth feature matching according to the present invention.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to the drawings.
Example 1:
the on-line target detection method based on depth feature matching provided by the invention is shown in fig. 1, and comprises the following steps:
step 1: acquiring a new class target sample image and an image to be detected;
step 2: inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested;
step 3: obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Specifically, step 1 includes:
according to the method, when the unknown type target is detected, the target can be detected in the image only by one new type target sample image, so that complicated data acquisition, data labeling and offline training of an algorithm model of the new type target sample image are avoided.
Specifically, step 2 includes:
and selecting image data covering various categories as much as possible from a public database or an actually collected data set, intercepting the category corresponding to the label from the original image to serve as an old category target sample image, and taking the original image as an image to be detected to form an old category target sample image and an image pair to be detected.
Neural network backbone networks include, but are not limited to, the following: alexNet, VGGNet, googleNet, resNet (residual network), resNeXt, resNeSt, denseNet (dense connectivity network), squeezeNet, shuffleNet, mobileNet, efficientNet, transducer, region generation network includes region generation network classification branches and region generation regression classification branches.
In the training phase for the neural network backbone network and the region generation network, as shown in fig. 2, the training phase includes:
the resolution of the existing old class target sample image is adjusted to be 1 multiplied by 3 multiplied by 127, then the old class target sample image is input into a neural network backbone network, the neural network backbone network of the embodiment is AlexNet, a depth feature map of the old class target sample image is obtained, the resolution is 1 multiplied by 256 multiplied by 6, the region generation network classification branch and the region generation regression classification branch comprise convolution operation units, the convolution operation units comprise two-dimensional convolution operation, normalization operation and activation operation, the obtained depth feature map is input into the region generation network classification branch, obtaining a classification feature map of an old class target sample image, wherein the resolution of the classification feature map is 1×2560×4×4, generating a classification convolution kernel after proper matrix deformation operation, wherein the resolution of the classification convolution kernel is 10×256×4×4, generating a network regression branch by an input area of the obtained depth feature map, obtaining a regression feature map of the old class target sample image, wherein the resolution of the regression feature map is 1×5120×4×4, generating a regression convolution kernel after proper matrix deformation operation, and the resolution of the regression convolution kernel is 20×256×4×4;
the resolution of the image to be detected is adjusted to be 1 multiplied by 3 multiplied by 271, the required resolution of the image to be detected is higher than that of the old class target sample image, preferably the resolution is more than 2 times that of the old class target sample image, and then the image to be detected is input into AlexNet to obtain a depth feature map of the image to be detected, wherein the resolution is 1 multiplied by 256 multiplied by 24; inputting a depth feature image of an image to be detected into a region generation network classification branch to generate a classification feature image of the image to be detected, wherein the resolution of the classification feature image is 1×256×22×22, inputting the depth feature image of the image to be detected into a region generation network regression branch to generate a regression feature image of the image to be detected, the resolution of the regression feature image is 1×256×22×22, carrying out convolution operation on a classification convolution kernel of an old class target sample image on the classification feature image obtained by the image to be detected, thereby obtaining a classification activation feature image of the image to be detected, the resolution of the classification activation feature image is (1×10×19×19), the channel number of the classification activation feature image is 2k, carrying out convolution operation on a regression convolution kernel of the old class target sample image on the regression feature image obtained by the image to be detected, thereby obtaining a regression activation feature image of the image to be detected, the resolution of the regression activation feature image is (1×20×19×19), and the output channel number of the regression feature image is 4k;
the number of anchor frames is defined as 5 and the aspect ratios are [0.33,0.5,1,2,3], respectively. Calculating cross entropy loss functions on five anchor frames of each pixel point (19 multiplied by 19) on the classification characteristic diagram, wherein the cross entropy loss functions are as follows:
wherein L is cls To determine the loss function for cross entropy, N is the number of samples, c i Class labels for sample i, with target 1, no target 0, p i Predicted as sample iProbability of 1;
calculating smoothL1 loss functions of each pixel point on five anchor boxes on the regression feature map:
wherein, the liquid crystal display device comprises a liquid crystal display device,detecting a regression loss function for a target, wherein x is an input characteristic, and sigma is an adjustment parameter;
the offset of the abscissa of the central point between the target frame and the anchor frame is as follows:
wherein delta [0 ]]T is the offset of the abscissa of the central point between the target frame and the anchor frame x Is the abscissa of the central point of the target frame, A x Is the abscissa of the center point of the anchor frame, A w Is the width of the anchor frame;
the offset of the ordinate of the central point between the target frame and the anchor frame is as follows:
wherein delta 1]T is the offset of the ordinate of the central point between the target frame and the anchor frame y Is the ordinate of the center point of the target frame, A y Is the ordinate of the center point of the anchor frame, A h Is the height of the anchor frame;
the wide offset between the target frame and the anchor frame is:
wherein delta [2 ]]For a wide offset between the target frame and the anchor frame, T w Is the width of the target frame;
the high offset between the target frame and the anchor frame is:
wherein delta 3]For high offset between target frame and anchor frame, T h Is the height of the target frame;
the loss function of the target detection algorithm is:
wherein f loss Is the loss function of the target detection algorithm, lambda is the hyper-parameter,is delta [ i ]]A target detection regression loss function of (2);
deriving the loss function of the target detection algorithm, and obtaining weights of the neural network backbone network and the regional generation network by using a random gradient descent (GSD), wherein the trained weights are required to be used in the following detection process.
The process of online detecting the new class of target sample image, as shown in fig. 3, includes:
the resolution of the new class target sample image is adjusted to be 1 multiplied by 3 multiplied by 127, the new class target sample image is input into a neural network backbone network, a depth feature image of the new class target sample image is obtained, the neural network backbone network at the moment is consistent with a training stage, the weight obtained in the training stage is loaded, a network classification branch is generated in the input area of the depth feature image of the new class target sample image, a classification feature image of the new class target sample image is obtained, the resolution of the classification feature image is 1 multiplied by 2560 multiplied by 4, and a classification convolution kernel is generated after proper matrix deformation operation, wherein the resolution of the classification convolution kernel is 10 multiplied by 256 multiplied by 4; and inputting the depth feature images of the new class of target sample images into the area to generate network regression branches, obtaining regression feature images of the new class of target sample images, wherein the resolution of the regression feature images is 1 multiplied by 5120 multiplied by 4, and generating regression convolution kernels after proper matrix deformation operation, and the resolution of the regression convolution kernels is 20 multiplied by 256 multiplied by 4.
The resolution of the image to be detected is adjusted to be 1 multiplied by 3 multiplied by 271, the required resolution of the image to be detected is higher than that of the new class of target sample images, and the resolution is preferably more than 2 times that of the new class of target sample images; then inputting the image to be detected into AlexNet to obtain a depth feature map of the image to be detected, wherein the resolution is 1 multiplied by 256 multiplied by 24; inputting the depth feature image of the image to be detected into a region generation network classification branch to generate a classification feature image of the image to be detected, wherein the resolution of the classification feature image is 1 multiplied by 256 multiplied by 22, inputting the depth feature image of the image to be detected into a region generation network regression branch to generate a regression feature image of the image to be detected, the resolution of the regression feature image is 1 multiplied by 256 multiplied by 22, carrying out convolution operation on a classification convolution kernel (10 multiplied by 256 multiplied by 4) of a new class target sample image on the classification feature image (1 multiplied by 256 multiplied by 22) of the obtained image to be detected, obtaining a classification activation feature map of the image to be detected, wherein the resolution of the classification activation feature map is (1×10×19×19), the channel number of the classification activation feature map is 2k, and performing convolution operation on a regression convolution kernel (20×256×4×4) of the new class target sample image on a regression feature map (1×256×22×22) of the image to be detected, so as to obtain a regression activation feature map of the image to be detected, the resolution of the regression activation feature map is (1×20×19×19), and the output channel number of the regression activation feature map is 4k; according to the invention, the channel number of the feature map is adjusted through the convolution operation unit of the regional generation network, so that the input feature map required by subsequent operation can be obtained, after the neural network backbone network and the regional generation network are trained in advance, when the unknown class targets are required to be detected, the new class target sample images are only required to be detected on the trained neural network backbone network and the regional generation network, and the subsequent required classified activation feature map and regression activation feature map can be rapidly obtained.
Specifically, step 3 includes:
the number and the shape of the anchor frames are consistent with those of the training stage, the number of the anchor frames is 5, the width of the anchor frames is 19, the height of the anchor frames is 19,
the aspect ratio of the anchor frame is [0.33,0.5,1,2,3], and the classification feature map of the network classification branch output is generated for the region, as shown in fig. 4, the feature tensor of the classification feature map is:
wherein, the liquid crystal display device comprises a liquid crystal display device,for classifying feature tensors of a feature map +.>For the ith abscissa corresponding to the anchor frame, < ->For the anchor frame corresponds to the j-th ordinate, < >>For the first target probability value, i e [0,19 ], j e [0,19), l e [0, 10);
the odd channels represent targets in the anchor frame with the position, and the softmax activation function is used for selecting the odd channelsThe several values with the largest class feature values, the softmax activation function, are as follows:
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Order theThe category characteristic value is the largestThe positions corresponding to the plurality of values are:
wherein, CLS * Is thatThe positions corresponding to a plurality of values with the largest category characteristic values;
for a regression feature map of regional generation network regression branch output, as shown in fig. 5, the feature tensor of the regression feature map is:
wherein, the liquid crystal display device comprises a liquid crystal display device,for regression of feature tensors of feature map, +.>For the anchor frame corresponding to the ith abscissa, < + >>For the anchor frame corresponding to the j-th ordinate, dx p reg For the p-th offset of the abscissa of the center point between the target frame and the anchor frame, +.>Is the p-th offset of the ordinate of the central point between the target frame and the anchor frame, +.>P-th offset for the width of the center between the target frame and the anchor frame>P-th high center point between target frame and anchor frameOffset (s)/(s)>For the p-th target probability value, i e [0,19 ], j e [0,19), p e [0, 5);
the anchor frame set obtained according to the regional generation network classification branch is as follows:
wherein ANCHOR is as follows * A set of anchor boxes obtained by the network classification branch is generated for the region,the ith abscissa of the center point of the anchor frame,/->The j-th ordinate of the center point of the anchor frame,>is the width of the anchor frame>Is the height of the anchor frame;
the position offset and the wide and high offset sets of the corresponding target frame position relative to the anchor frame can be obtained in the regression feature map as follows:
wherein REGRESSION * To obtain the corresponding position offset of the target frame position relative to the anchor frame in the regression feature map and a wide and high offset set,for the first offset of the abscissa of the center point between the target frame and the anchor frame,is the first offset of the ordinate of the central point between the target frame and the anchor frame, +.>For the first offset of the width between target frame and anchor frame, +.>A high first offset between the target frame and the anchor frame;
computing the abscissa of the position of the mapped target frameThe calculation formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,an abscissa of the position of the mapped target frame;
computing the ordinate of the position of the mapped target frameThe calculation formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,an ordinate that is the position of the mapped target frame;
wherein, the liquid crystal display device comprises a liquid crystal display device,width of the target frame for mapping;
wherein, the liquid crystal display device comprises a liquid crystal display device,high for the mapped target box;
through the calculation, the position of the mapped target frame can be obtainedAnd size->The target position and the size obtained by mapping are subjected to a non-maximum suppression (NMS) algorithm to obtain the final target position and the final target size; according to the method, the classification activation feature map and the regression activation feature map of the image to be detected are mapped back to the original image to be detected, so that the online detection of a new class is realized, and the method is particularly suitable for video monitoring, unmanned aerial vehicle ground detection and high-value target discovery of remote sensing images.
Example 2:
based on the same inventive concept, the invention also provides an online target detection system based on depth feature matching, as shown in fig. 6:
the device comprises an image acquisition module, a feature map acquisition module and a target detection module;
the image acquisition module is used for acquiring a new class target sample image and an image to be detected;
the feature map acquisition module is used for inputting the new class target sample image and the image to be tested into a neural network backbone network and a region generating network after training is completed, and obtaining a classified activation feature map and a regression activation feature map of the image to be tested;
the target detection module is used for obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Preferably, the region generating network in the feature map acquiring module includes a region generating network classification branch and a region generating regression classification branch, and the training of the region generating network includes:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
Preferably, the feature map obtaining module is specifically configured to:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
Preferably, the target detection module is specifically configured to:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
Preferably, the activation function of the target detection module is as follows:
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the scope of protection thereof, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that various changes, modifications or equivalents may be made to the specific embodiments of the application after reading the present invention, and these changes, modifications or equivalents are within the scope of protection of the claims appended hereto.
Claims (10)
1. An online target detection method based on depth feature matching is characterized by comprising the following steps:
acquiring a new class target sample image and an image to be detected;
inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested;
obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
2. The method of claim 1, wherein the region-generating network comprises a region-generating network classification branch and a region-generating regression classification branch, the training of the region-generating network comprising:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
3. The method of claim 2, wherein inputting the new class target sample image and the image to be tested into the trained neural network backbone network and the region generation network to obtain a class activation feature map and a regression activation feature map of the image to be tested, comprises:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
4. The method of claim 3, wherein the obtaining the class and location box of the new class target sample image in the image to be measured based on the class activation feature map and the regression activation feature map comprises:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
6. An online object detection system based on depth feature matching, comprising:
the device comprises an image acquisition module, a feature map acquisition module and a target detection module;
the image acquisition module is used for acquiring a new class target sample image and an image to be detected;
the feature map acquisition module is used for inputting the new class target sample image and the image to be tested into a neural network backbone network and a region generating network after training is completed, and obtaining a classified activation feature map and a regression activation feature map of the image to be tested;
the target detection module is used for obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
7. The system of claim 6, wherein the region-generating network of the feature map acquisition module includes a region-generating network classification branch and a region-generating regression classification branch, the training of the region-generating network comprising:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
8. The system of claim 7, wherein the feature map acquisition module is specifically configured to:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
9. The system of claim 8, wherein the object detection module is specifically configured to:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
10. The system of claim 9, wherein the activation function of the object detection module is as follows:
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211664409.4A CN116229142A (en) | 2022-12-23 | 2022-12-23 | Online target detection method and system based on depth feature matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211664409.4A CN116229142A (en) | 2022-12-23 | 2022-12-23 | Online target detection method and system based on depth feature matching |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116229142A true CN116229142A (en) | 2023-06-06 |
Family
ID=86590114
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211664409.4A Pending CN116229142A (en) | 2022-12-23 | 2022-12-23 | Online target detection method and system based on depth feature matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116229142A (en) |
-
2022
- 2022-12-23 CN CN202211664409.4A patent/CN116229142A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN105740894B (en) | Semantic annotation method for hyperspectral remote sensing image | |
CN109583483B (en) | Target detection method and system based on convolutional neural network | |
CN113192040A (en) | Fabric flaw detection method based on YOLO v4 improved algorithm | |
Carrera et al. | Scale-invariant anomaly detection with multiscale group-sparse models | |
CN109584206B (en) | Method for synthesizing training sample of neural network in part surface flaw detection | |
CN111798490B (en) | Video SAR vehicle target detection method | |
CN114155474A (en) | Damage identification technology based on video semantic segmentation algorithm | |
CN112906795A (en) | Whistle vehicle judgment method based on convolutional neural network | |
Alshehri | A content-based image retrieval method using neural network-based prediction technique | |
CN116310852A (en) | Double-time-phase remote sensing image unsupervised classification and change detection method and system | |
CN108921872B (en) | Robust visual target tracking method suitable for long-range tracking | |
CN116758419A (en) | Multi-scale target detection method, device and equipment for remote sensing image | |
CN112559791A (en) | Cloth classification retrieval method based on deep learning | |
CN112270404A (en) | Detection structure and method for bulge defect of fastener product based on ResNet64 network | |
CN116863341A (en) | Crop classification and identification method and system based on time sequence satellite remote sensing image | |
CN115294392B (en) | Visible light remote sensing image cloud removal method and system based on network model generation | |
CN116229142A (en) | Online target detection method and system based on depth feature matching | |
CN114792300B (en) | X-ray broken needle detection method based on multi-scale attention | |
Yu et al. | An image-based automatic recognition method for the flowering stage of maize | |
CN107679528A (en) | A kind of pedestrian detection method based on AdaBoost SVM Ensemble Learning Algorithms | |
CN111046861B (en) | Method for identifying infrared image, method for constructing identification model and application | |
CN111461060A (en) | Traffic sign identification method based on deep learning and extreme learning machine | |
Ayumi et al. | Forest Fire Detection Using Transfer Learning Model with Contrast Enhancement and Data Augmentation | |
CN117765410B (en) | Remote sensing image double-branch feature fusion solid waste identification method and system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |