CN116229142A - Online target detection method and system based on depth feature matching - Google Patents

Online target detection method and system based on depth feature matching Download PDF

Info

Publication number
CN116229142A
CN116229142A CN202211664409.4A CN202211664409A CN116229142A CN 116229142 A CN116229142 A CN 116229142A CN 202211664409 A CN202211664409 A CN 202211664409A CN 116229142 A CN116229142 A CN 116229142A
Authority
CN
China
Prior art keywords
image
feature map
regression
target sample
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211664409.4A
Other languages
Chinese (zh)
Inventor
张天昊
田涛
王浩
苏龙飞
于泽婷
蔡慧敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Binhai Artificial Intelligence Innovation Center
Original Assignee
Tianjin Binhai Artificial Intelligence Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Binhai Artificial Intelligence Innovation Center filed Critical Tianjin Binhai Artificial Intelligence Innovation Center
Priority to CN202211664409.4A priority Critical patent/CN116229142A/en
Publication of CN116229142A publication Critical patent/CN116229142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an online target detection method and system based on depth feature matching, comprising the following steps: acquiring a new class target sample image and an image to be detected; inputting the new class target sample image and the image to be tested into a trained neural network backbone network and an area generating network to obtain a class activation feature map and a regression activation feature map of the image to be tested; obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map; the invention avoids complex data acquisition, data annotation and off-line training of an algorithm model of a new class of target sample image, and obtains the class and position frame of the new class of target sample image in the image to be detected by detecting one or a small number of new class of target sample images through a neural network backbone network and a regional generation network, thereby being particularly suitable for video monitoring, unmanned aerial vehicle earth detection and high-value target discovery of remote sensing images.

Description

Online target detection method and system based on depth feature matching
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an online target detection method and system based on depth feature matching.
Background
Deep learning (deep learning) has been widely used in various fields of society. Target detection is one of the basic problems in the field of computer vision, and the most excellent target detection algorithm at present is a target detection algorithm based on deep learning. The current target detector depends on a large amount of annotation data for training, and large data sets of categories such as faces, pedestrians, vehicles and the like exist at present, but for some new categories and new samples, large data are difficult to construct. The target detection based on the deep learning is the same as most of the deep learning algorithms, a large amount of marked data is required to perform supervised learning training, and when the number of marked data is limited, the accuracy and generalization capability of the algorithm are difficult to ensure. When a sufficient number of samples can be obtained, we can label the data manually or automatically or semi-automatically, thereby obtaining a large amount of labeled data. However, in some application scenarios, it is difficult to obtain many sample data, for example, search a suspicious person from a monitoring video, the suspicious person only has one or several pictures, it is impossible to perform large-scale deep learning training, and it is difficult to obtain an effective deep learning model, and then the application of a target detection algorithm based on deep learning is greatly limited.
Currently, detection is performed on targets with only one sample or few samples, and a template matching method is generally adopted. Template matching is the most basic pattern recognition method, and is the most basic and most commonly used matching method in image processing, wherein the pattern of a specific object is researched and positioned in the image, so that the object is recognized. Template matching has the limitation of itself, mainly represented by that it can only make parallel movement, if the matching target in the original image is rotated or changed in size, the algorithm performance is suddenly reduced, that is, the template matching method generally has no rotation invariance. The current deep learning method is widely applied, and the feature map extracted by the deep neural network has translational invariance and rotational invariance, so that a very good effect is achieved in the field of target detection.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an online target detection method based on depth feature matching, which comprises the following steps:
acquiring a new class target sample image and an image to be detected;
inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested;
obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Preferably, the region generation network includes a region generation network classification branch and a region generation regression classification branch, and the training of the region generation network includes:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
Preferably, the inputting the new class target sample image and the image to be tested into the trained neural network backbone network and the region generating network to obtain a class activation feature map and a regression activation feature map of the image to be tested includes:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
Preferably, the obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map includes:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
Preferably, the activation function is as follows:
Figure BDA0004014163280000031
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Based on the same inventive concept, the invention also provides an online target detection system based on depth feature matching, which comprises:
the device comprises an image acquisition module, a feature map acquisition module and a target detection module;
the image acquisition module is used for acquiring a new class target sample image and an image to be detected;
the feature map acquisition module is used for inputting the new class target sample image and the image to be tested into a neural network backbone network and a region generating network after training is completed, and obtaining a classified activation feature map and a regression activation feature map of the image to be tested;
the target detection module is used for obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Preferably, the region generating network in the feature map acquiring module includes a region generating network classification branch and a region generating regression classification branch, and the training of the region generating network includes:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
Preferably, the feature map obtaining module is specifically configured to:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
Preferably, the target detection module is specifically configured to:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
Preferably, the activation function of the target detection module is as follows:
Figure BDA0004014163280000041
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Compared with the closest prior art, the invention has the following beneficial effects:
the invention provides an online target detection method and system based on depth feature matching, comprising the following steps: acquiring a new class target sample image and an image to be detected; inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested; obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map; the new class target sample image is an image obtained from the image to be detected; training the neural network backbone network and the area generating network based on the old class target sample image and the image to be tested and obtaining trained weights; the invention avoids complex data acquisition, data annotation and off-line training of an algorithm model of a new class of target sample image, and obtains the class and position frame of the new class of target sample image in the image to be detected by detecting one or a small number of new class of target sample images through a neural network backbone network and a regional generation network, thereby being particularly suitable for video monitoring, unmanned aerial vehicle earth detection and high-value target discovery of remote sensing images.
Drawings
FIG. 1 is a schematic flow chart of an online target detection method based on depth feature matching;
FIG. 2 is a training flowchart of an online target detection algorithm based on depth feature matching in an embodiment provided by the invention;
FIG. 3 is a flowchart of an online target detection algorithm based on depth feature matching in an embodiment provided by the invention;
FIG. 4 is a schematic diagram of a classification characteristic diagram according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a regression feature map according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an online object detection system based on depth feature matching according to the present invention.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to the drawings.
Example 1:
the on-line target detection method based on depth feature matching provided by the invention is shown in fig. 1, and comprises the following steps:
step 1: acquiring a new class target sample image and an image to be detected;
step 2: inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested;
step 3: obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Specifically, step 1 includes:
according to the method, when the unknown type target is detected, the target can be detected in the image only by one new type target sample image, so that complicated data acquisition, data labeling and offline training of an algorithm model of the new type target sample image are avoided.
Specifically, step 2 includes:
and selecting image data covering various categories as much as possible from a public database or an actually collected data set, intercepting the category corresponding to the label from the original image to serve as an old category target sample image, and taking the original image as an image to be detected to form an old category target sample image and an image pair to be detected.
Neural network backbone networks include, but are not limited to, the following: alexNet, VGGNet, googleNet, resNet (residual network), resNeXt, resNeSt, denseNet (dense connectivity network), squeezeNet, shuffleNet, mobileNet, efficientNet, transducer, region generation network includes region generation network classification branches and region generation regression classification branches.
In the training phase for the neural network backbone network and the region generation network, as shown in fig. 2, the training phase includes:
the resolution of the existing old class target sample image is adjusted to be 1 multiplied by 3 multiplied by 127, then the old class target sample image is input into a neural network backbone network, the neural network backbone network of the embodiment is AlexNet, a depth feature map of the old class target sample image is obtained, the resolution is 1 multiplied by 256 multiplied by 6, the region generation network classification branch and the region generation regression classification branch comprise convolution operation units, the convolution operation units comprise two-dimensional convolution operation, normalization operation and activation operation, the obtained depth feature map is input into the region generation network classification branch, obtaining a classification feature map of an old class target sample image, wherein the resolution of the classification feature map is 1×2560×4×4, generating a classification convolution kernel after proper matrix deformation operation, wherein the resolution of the classification convolution kernel is 10×256×4×4, generating a network regression branch by an input area of the obtained depth feature map, obtaining a regression feature map of the old class target sample image, wherein the resolution of the regression feature map is 1×5120×4×4, generating a regression convolution kernel after proper matrix deformation operation, and the resolution of the regression convolution kernel is 20×256×4×4;
the resolution of the image to be detected is adjusted to be 1 multiplied by 3 multiplied by 271, the required resolution of the image to be detected is higher than that of the old class target sample image, preferably the resolution is more than 2 times that of the old class target sample image, and then the image to be detected is input into AlexNet to obtain a depth feature map of the image to be detected, wherein the resolution is 1 multiplied by 256 multiplied by 24; inputting a depth feature image of an image to be detected into a region generation network classification branch to generate a classification feature image of the image to be detected, wherein the resolution of the classification feature image is 1×256×22×22, inputting the depth feature image of the image to be detected into a region generation network regression branch to generate a regression feature image of the image to be detected, the resolution of the regression feature image is 1×256×22×22, carrying out convolution operation on a classification convolution kernel of an old class target sample image on the classification feature image obtained by the image to be detected, thereby obtaining a classification activation feature image of the image to be detected, the resolution of the classification activation feature image is (1×10×19×19), the channel number of the classification activation feature image is 2k, carrying out convolution operation on a regression convolution kernel of the old class target sample image on the regression feature image obtained by the image to be detected, thereby obtaining a regression activation feature image of the image to be detected, the resolution of the regression activation feature image is (1×20×19×19), and the output channel number of the regression feature image is 4k;
the number of anchor frames is defined as 5 and the aspect ratios are [0.33,0.5,1,2,3], respectively. Calculating cross entropy loss functions on five anchor frames of each pixel point (19 multiplied by 19) on the classification characteristic diagram, wherein the cross entropy loss functions are as follows:
Figure BDA0004014163280000061
wherein L is cls To determine the loss function for cross entropy, N is the number of samples, c i Class labels for sample i, with target 1, no target 0, p i Predicted as sample iProbability of 1;
calculating smoothL1 loss functions of each pixel point on five anchor boxes on the regression feature map:
Figure BDA0004014163280000071
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004014163280000072
detecting a regression loss function for a target, wherein x is an input characteristic, and sigma is an adjustment parameter;
the offset of the abscissa of the central point between the target frame and the anchor frame is as follows:
Figure BDA0004014163280000073
wherein delta [0 ]]T is the offset of the abscissa of the central point between the target frame and the anchor frame x Is the abscissa of the central point of the target frame, A x Is the abscissa of the center point of the anchor frame, A w Is the width of the anchor frame;
the offset of the ordinate of the central point between the target frame and the anchor frame is as follows:
Figure BDA0004014163280000074
/>
wherein delta 1]T is the offset of the ordinate of the central point between the target frame and the anchor frame y Is the ordinate of the center point of the target frame, A y Is the ordinate of the center point of the anchor frame, A h Is the height of the anchor frame;
the wide offset between the target frame and the anchor frame is:
Figure BDA0004014163280000075
wherein delta [2 ]]For a wide offset between the target frame and the anchor frame, T w Is the width of the target frame;
the high offset between the target frame and the anchor frame is:
Figure BDA0004014163280000076
wherein delta 3]For high offset between target frame and anchor frame, T h Is the height of the target frame;
the loss function of the target detection algorithm is:
Figure BDA0004014163280000077
wherein f loss Is the loss function of the target detection algorithm, lambda is the hyper-parameter,
Figure BDA0004014163280000078
is delta [ i ]]A target detection regression loss function of (2);
deriving the loss function of the target detection algorithm, and obtaining weights of the neural network backbone network and the regional generation network by using a random gradient descent (GSD), wherein the trained weights are required to be used in the following detection process.
The process of online detecting the new class of target sample image, as shown in fig. 3, includes:
the resolution of the new class target sample image is adjusted to be 1 multiplied by 3 multiplied by 127, the new class target sample image is input into a neural network backbone network, a depth feature image of the new class target sample image is obtained, the neural network backbone network at the moment is consistent with a training stage, the weight obtained in the training stage is loaded, a network classification branch is generated in the input area of the depth feature image of the new class target sample image, a classification feature image of the new class target sample image is obtained, the resolution of the classification feature image is 1 multiplied by 2560 multiplied by 4, and a classification convolution kernel is generated after proper matrix deformation operation, wherein the resolution of the classification convolution kernel is 10 multiplied by 256 multiplied by 4; and inputting the depth feature images of the new class of target sample images into the area to generate network regression branches, obtaining regression feature images of the new class of target sample images, wherein the resolution of the regression feature images is 1 multiplied by 5120 multiplied by 4, and generating regression convolution kernels after proper matrix deformation operation, and the resolution of the regression convolution kernels is 20 multiplied by 256 multiplied by 4.
The resolution of the image to be detected is adjusted to be 1 multiplied by 3 multiplied by 271, the required resolution of the image to be detected is higher than that of the new class of target sample images, and the resolution is preferably more than 2 times that of the new class of target sample images; then inputting the image to be detected into AlexNet to obtain a depth feature map of the image to be detected, wherein the resolution is 1 multiplied by 256 multiplied by 24; inputting the depth feature image of the image to be detected into a region generation network classification branch to generate a classification feature image of the image to be detected, wherein the resolution of the classification feature image is 1 multiplied by 256 multiplied by 22, inputting the depth feature image of the image to be detected into a region generation network regression branch to generate a regression feature image of the image to be detected, the resolution of the regression feature image is 1 multiplied by 256 multiplied by 22, carrying out convolution operation on a classification convolution kernel (10 multiplied by 256 multiplied by 4) of a new class target sample image on the classification feature image (1 multiplied by 256 multiplied by 22) of the obtained image to be detected, obtaining a classification activation feature map of the image to be detected, wherein the resolution of the classification activation feature map is (1×10×19×19), the channel number of the classification activation feature map is 2k, and performing convolution operation on a regression convolution kernel (20×256×4×4) of the new class target sample image on a regression feature map (1×256×22×22) of the image to be detected, so as to obtain a regression activation feature map of the image to be detected, the resolution of the regression activation feature map is (1×20×19×19), and the output channel number of the regression activation feature map is 4k; according to the invention, the channel number of the feature map is adjusted through the convolution operation unit of the regional generation network, so that the input feature map required by subsequent operation can be obtained, after the neural network backbone network and the regional generation network are trained in advance, when the unknown class targets are required to be detected, the new class target sample images are only required to be detected on the trained neural network backbone network and the regional generation network, and the subsequent required classified activation feature map and regression activation feature map can be rapidly obtained.
Specifically, step 3 includes:
the number and the shape of the anchor frames are consistent with those of the training stage, the number of the anchor frames is 5, the width of the anchor frames is 19, the height of the anchor frames is 19,
the aspect ratio of the anchor frame is [0.33,0.5,1,2,3], and the classification feature map of the network classification branch output is generated for the region, as shown in fig. 4, the feature tensor of the classification feature map is:
Figure BDA0004014163280000081
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004014163280000082
for classifying feature tensors of a feature map +.>
Figure BDA0004014163280000083
For the ith abscissa corresponding to the anchor frame, < ->
Figure BDA0004014163280000084
For the anchor frame corresponds to the j-th ordinate, < >>
Figure BDA0004014163280000085
For the first target probability value, i e [0,19 ], j e [0,19), l e [0, 10);
the odd channels represent targets in the anchor frame with the position, and the softmax activation function is used for selecting the odd channels
Figure BDA0004014163280000086
The several values with the largest class feature values, the softmax activation function, are as follows:
Figure BDA0004014163280000091
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
Order the
Figure BDA0004014163280000092
The category characteristic value is the largestThe positions corresponding to the plurality of values are:
Figure BDA0004014163280000093
wherein, CLS * Is that
Figure BDA0004014163280000094
The positions corresponding to a plurality of values with the largest category characteristic values;
for a regression feature map of regional generation network regression branch output, as shown in fig. 5, the feature tensor of the regression feature map is:
Figure BDA0004014163280000095
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004014163280000096
for regression of feature tensors of feature map, +.>
Figure BDA0004014163280000097
For the anchor frame corresponding to the ith abscissa, < + >>
Figure BDA0004014163280000098
For the anchor frame corresponding to the j-th ordinate, dx p reg For the p-th offset of the abscissa of the center point between the target frame and the anchor frame, +.>
Figure BDA0004014163280000099
Is the p-th offset of the ordinate of the central point between the target frame and the anchor frame, +.>
Figure BDA00040141632800000910
P-th offset for the width of the center between the target frame and the anchor frame>
Figure BDA00040141632800000911
P-th high center point between target frame and anchor frameOffset (s)/(s)>
Figure BDA00040141632800000912
For the p-th target probability value, i e [0,19 ], j e [0,19), p e [0, 5);
the anchor frame set obtained according to the regional generation network classification branch is as follows:
Figure BDA00040141632800000913
wherein ANCHOR is as follows * A set of anchor boxes obtained by the network classification branch is generated for the region,
Figure BDA00040141632800000914
the ith abscissa of the center point of the anchor frame,/->
Figure BDA00040141632800000915
The j-th ordinate of the center point of the anchor frame,>
Figure BDA00040141632800000916
is the width of the anchor frame>
Figure BDA00040141632800000917
Is the height of the anchor frame;
the position offset and the wide and high offset sets of the corresponding target frame position relative to the anchor frame can be obtained in the regression feature map as follows:
Figure BDA00040141632800000918
wherein REGRESSION * To obtain the corresponding position offset of the target frame position relative to the anchor frame in the regression feature map and a wide and high offset set,
Figure BDA00040141632800000919
for the first offset of the abscissa of the center point between the target frame and the anchor frame,
Figure BDA00040141632800000920
is the first offset of the ordinate of the central point between the target frame and the anchor frame, +.>
Figure BDA00040141632800000921
For the first offset of the width between target frame and anchor frame, +.>
Figure BDA00040141632800000922
A high first offset between the target frame and the anchor frame;
computing the abscissa of the position of the mapped target frame
Figure BDA00040141632800000923
The calculation formula is as follows:
Figure BDA0004014163280000101
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004014163280000102
an abscissa of the position of the mapped target frame;
computing the ordinate of the position of the mapped target frame
Figure BDA0004014163280000103
The calculation formula is as follows:
Figure BDA0004014163280000104
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004014163280000105
an ordinate that is the position of the mapped target frame;
calculating the width of the mapped target frame
Figure BDA0004014163280000106
The calculation formula is as follows:
Figure BDA0004014163280000107
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004014163280000108
width of the target frame for mapping;
computing the height of the mapped target frame
Figure BDA0004014163280000109
The calculation formula is as follows:
Figure BDA00040141632800001010
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00040141632800001011
high for the mapped target box;
through the calculation, the position of the mapped target frame can be obtained
Figure BDA00040141632800001012
And size->
Figure BDA00040141632800001013
The target position and the size obtained by mapping are subjected to a non-maximum suppression (NMS) algorithm to obtain the final target position and the final target size; according to the method, the classification activation feature map and the regression activation feature map of the image to be detected are mapped back to the original image to be detected, so that the online detection of a new class is realized, and the method is particularly suitable for video monitoring, unmanned aerial vehicle ground detection and high-value target discovery of remote sensing images.
Example 2:
based on the same inventive concept, the invention also provides an online target detection system based on depth feature matching, as shown in fig. 6:
the device comprises an image acquisition module, a feature map acquisition module and a target detection module;
the image acquisition module is used for acquiring a new class target sample image and an image to be detected;
the feature map acquisition module is used for inputting the new class target sample image and the image to be tested into a neural network backbone network and a region generating network after training is completed, and obtaining a classified activation feature map and a regression activation feature map of the image to be tested;
the target detection module is used for obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
Preferably, the region generating network in the feature map acquiring module includes a region generating network classification branch and a region generating regression classification branch, and the training of the region generating network includes:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
Preferably, the feature map obtaining module is specifically configured to:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
Preferably, the target detection module is specifically configured to:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
Preferably, the activation function of the target detection module is as follows:
Figure BDA0004014163280000111
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the scope of protection thereof, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that various changes, modifications or equivalents may be made to the specific embodiments of the application after reading the present invention, and these changes, modifications or equivalents are within the scope of protection of the claims appended hereto.

Claims (10)

1. An online target detection method based on depth feature matching is characterized by comprising the following steps:
acquiring a new class target sample image and an image to be detected;
inputting the new class target sample image and the image to be tested into a neural network backbone network and an area generating network after training is completed, and obtaining a class activation feature map and a regression activation feature map of the image to be tested;
obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
2. The method of claim 1, wherein the region-generating network comprises a region-generating network classification branch and a region-generating regression classification branch, the training of the region-generating network comprising:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
3. The method of claim 2, wherein inputting the new class target sample image and the image to be tested into the trained neural network backbone network and the region generation network to obtain a class activation feature map and a regression activation feature map of the image to be tested, comprises:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
4. The method of claim 3, wherein the obtaining the class and location box of the new class target sample image in the image to be measured based on the class activation feature map and the regression activation feature map comprises:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
5. The method of claim 4, wherein the activation function is as follows:
Figure FDA0004014163270000021
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
6. An online object detection system based on depth feature matching, comprising:
the device comprises an image acquisition module, a feature map acquisition module and a target detection module;
the image acquisition module is used for acquiring a new class target sample image and an image to be detected;
the feature map acquisition module is used for inputting the new class target sample image and the image to be tested into a neural network backbone network and a region generating network after training is completed, and obtaining a classified activation feature map and a regression activation feature map of the image to be tested;
the target detection module is used for obtaining the category and the position frame of the new category target sample image in the image to be detected based on the category activation feature map and the regression activation feature map;
the new class target sample image is an image obtained from the image to be detected; the training of the neural network backbone network and the area generating network is training based on the old class target sample image and the image to be tested, and the trained weight is obtained.
7. The system of claim 6, wherein the region-generating network of the feature map acquisition module includes a region-generating network classification branch and a region-generating regression classification branch, the training of the region-generating network comprising:
acquiring an old class target sample image and an image to be detected;
inputting the old class target sample image into the region generation network classification branch to generate a first classification feature image, and calculating a cross entropy loss function of a preset anchor frame class and a target real class label in the image to be detected based on the first classification feature image;
inputting the old class target sample image into the region to generate a regression classification branch, generating a first regression feature map, and calculating a loss function of a preset anchor frame position and a target real position in an image to be detected based on the first regression feature map;
and training the regional generation network by adopting a random gradient descent method based on the cross entropy loss function and the loss function to obtain the weight of the regional generation network.
8. The system of claim 7, wherein the feature map acquisition module is specifically configured to:
inputting the new class target sample image into a trained neural network backbone network to obtain a first depth feature image, inputting the first depth feature image into the region generation network classification branch to generate a classification convolution kernel, and inputting the first depth feature image into the region generation network regression branch to generate a regression convolution kernel;
inputting the image to be tested into a trained neural network backbone network to obtain a second depth feature map, inputting the second depth feature map into the region generation network classification branch, generating a classification activation feature map based on the classification convolution kernel, inputting the second depth feature map into the region generation network regression branch, and generating a regression activation feature map based on the regression convolution kernel.
9. The system of claim 8, wherein the object detection module is specifically configured to:
activating the values of the classified activation feature images through an activation function, so that the classified activation feature images are mapped onto the images to be detected, and the categories of the new category target sample images are obtained;
and mapping the regression activation feature map to the image to be detected based on the anchor frame position obtained in advance during training, and obtaining a position frame of the new class target sample image.
10. The system of claim 9, wherein the activation function of the object detection module is as follows:
Figure FDA0004014163270000031
wherein, softmax (x i ) An activation function for the x-th feature, x i For the x-th feature, x j For the j-th feature, max (x) is the maximum value in the input features, and C is the category number.
CN202211664409.4A 2022-12-23 2022-12-23 Online target detection method and system based on depth feature matching Pending CN116229142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211664409.4A CN116229142A (en) 2022-12-23 2022-12-23 Online target detection method and system based on depth feature matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211664409.4A CN116229142A (en) 2022-12-23 2022-12-23 Online target detection method and system based on depth feature matching

Publications (1)

Publication Number Publication Date
CN116229142A true CN116229142A (en) 2023-06-06

Family

ID=86590114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211664409.4A Pending CN116229142A (en) 2022-12-23 2022-12-23 Online target detection method and system based on depth feature matching

Country Status (1)

Country Link
CN (1) CN116229142A (en)

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN105740894B (en) Semantic annotation method for hyperspectral remote sensing image
CN109583483B (en) Target detection method and system based on convolutional neural network
CN113192040A (en) Fabric flaw detection method based on YOLO v4 improved algorithm
Carrera et al. Scale-invariant anomaly detection with multiscale group-sparse models
CN109584206B (en) Method for synthesizing training sample of neural network in part surface flaw detection
CN111798490B (en) Video SAR vehicle target detection method
CN114155474A (en) Damage identification technology based on video semantic segmentation algorithm
CN112906795A (en) Whistle vehicle judgment method based on convolutional neural network
Alshehri A content-based image retrieval method using neural network-based prediction technique
CN116310852A (en) Double-time-phase remote sensing image unsupervised classification and change detection method and system
CN108921872B (en) Robust visual target tracking method suitable for long-range tracking
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN112559791A (en) Cloth classification retrieval method based on deep learning
CN112270404A (en) Detection structure and method for bulge defect of fastener product based on ResNet64 network
CN116863341A (en) Crop classification and identification method and system based on time sequence satellite remote sensing image
CN115294392B (en) Visible light remote sensing image cloud removal method and system based on network model generation
CN116229142A (en) Online target detection method and system based on depth feature matching
CN114792300B (en) X-ray broken needle detection method based on multi-scale attention
Yu et al. An image-based automatic recognition method for the flowering stage of maize
CN107679528A (en) A kind of pedestrian detection method based on AdaBoost SVM Ensemble Learning Algorithms
CN111046861B (en) Method for identifying infrared image, method for constructing identification model and application
CN111461060A (en) Traffic sign identification method based on deep learning and extreme learning machine
Ayumi et al. Forest Fire Detection Using Transfer Learning Model with Contrast Enhancement and Data Augmentation
CN117765410B (en) Remote sensing image double-branch feature fusion solid waste identification method and system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination