CN116091823A - Single-feature anchor-frame-free target detection method based on fast grouping residual error module - Google Patents
Single-feature anchor-frame-free target detection method based on fast grouping residual error module Download PDFInfo
- Publication number
- CN116091823A CN116091823A CN202211693108.4A CN202211693108A CN116091823A CN 116091823 A CN116091823 A CN 116091823A CN 202211693108 A CN202211693108 A CN 202211693108A CN 116091823 A CN116091823 A CN 116091823A
- Authority
- CN
- China
- Prior art keywords
- target
- feature
- branch
- criticality
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 119
- 239000013598 vector Substances 0.000 claims abstract description 65
- 238000000034 method Methods 0.000 claims abstract description 35
- 230000006870 function Effects 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000005764 inhibitory process Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000009826 distribution Methods 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims description 2
- 102000037983 regulatory factors Human genes 0.000 claims description 2
- 108091008025 regulatory factors Proteins 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 4
- 230000002708 enhancing effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000000137 annealing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention is suitable for the technical field of computer vision deep learning, and relates to a single-feature anchor-frame-free target detection method based on a fast grouping residual error module, which comprises the following steps: s10, enhancing an image; s20, constructing a target detection network; s30, predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality; s40, calculating a classification branch error, a position branch error and a criticality branch error, and updating parameters of a target detection network until the iteration times reach preset times; s50, inputting the image to be detected into a model obtained when the iteration times reach the preset times, obtaining a target category probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted category score by utilizing the criticality branches, and obtaining the final target classification confidence and the target position. The method has the advantages of simple flow and high detection precision, reduces the parameter number of the model, and improves the detection speed of the model.
Description
Technical Field
The invention belongs to the technical field of computer vision deep learning, and particularly relates to a single-feature anchor-frame-free target detection method based on a fast grouping residual error module.
Background
Today, artificial intelligence technology is rapidly changing our world. As a dual-vision of the intelligent world, computer vision is a major branch of artificial intelligence technology by capturing digital images, video or other visual input signals and training with a deep learning model so that a computer can accurately identify and classify and then react to what it "sees". The object detection is used for positioning the position of an object frame in an image and identifying the category to which the object belongs, and is widely applied to downstream tasks such as scientific research, actual production, life and the like.
In order to improve the detection effect, the dense point prediction method represented by FCOS and FoveaBox uses a re-weighting method to improve the detection quality, and uses a FPN dividing and controlling solution, namely, manually designating different scales to enable targets to fall into corresponding feature maps to help a detector to separate overlapped targets, wherein the targets are designated to a certain layer for learning during training, which is basically another form of 'anchor frame', and each detection point of each layer has a hidden square anchor frame with fixed size. Since each feature layer needs to have a corresponding detection head, a large number of parameters result in slow detection. The patent with the publication number of CN112818964A provides an unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network, which comprises the steps of firstly setting initial parameters of a FoveaBox neural network model, inputting training set images in an unmanned aerial vehicle database into the set FoveaBox neural network model for training, and obtaining the unmanned aerial vehicle detection model based on deep learning; inputting an unmanned aerial vehicle image to be detected into a trained unmanned aerial vehicle detection model in the step one, and obtaining a characteristic diagram of the possibility of a multi-layer prediction target; and then processing the output characteristic diagram of the main network through the position sub-network, classifying the output characteristic diagram of the main network pixel by combining with the detection head sub-network, and directly detecting and acquiring the target type and the position information. The patent also adopts the foveaBox non-anchor neural network commonly used in the prior art, and the application range and the universality are ensured, but the defect still exists in the detection speed.
Therefore, how to provide a target detection method with high detection speed while ensuring the detection accuracy of the target is a problem to be solved by those skilled in the art.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a single-feature anchor-frame-free target detection method based on a fast packet residual error module so as to solve the problem of low detection speed of the target detection method in the prior art; in addition, the invention also provides a single-feature anchor-frame-free target detection medium based on the fast packet residual error module and a terminal.
In order to solve the technical problems, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a single-feature anchor-frame-free target detection method based on a fast packet residual module, including the steps of:
s10, carrying out image enhancement on a training sample image;
s20, constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein the single-feature anchor-frame-free target detection network comprises a cut RepVGG network, a U-shaped feature fusion network containing the rapid packet residual error module and a detection head, and the detection head comprises a classification branch network, a position branch network and a criticality branch network;
s30, inputting the training samples enhanced in the step S10 into the target detection network in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;
s40, marking positive and negative samples by adopting a positive and negative sample optimization strategy, respectively calculating a classification branch error, a position branch error and a criticality branch error according to the vector predicted in the step S30 and the real vector, and updating parameters of the target detection network based on the classification branch error, the position branch error and the criticality branch error until the iteration times reach preset times;
s50, taking the model obtained when the iteration times reach the preset times as the trained parameters of the target detection network model, inputting the image to be detected into the target detection network to obtain a target class probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted class score by utilizing the criticality branches, and obtaining the final target classification confidence and the target position.
Further, in the step S10, the image enhancement includes randomly adjusting brightness, chromaticity and contrast; cutting the image; randomly turning over the image left and right with 50% probability; randomly scaling the image with 50% probability, wherein the scaling factor is not lower than 0.5 times of the original image, and the magnification factor is not higher than 3 times of the original image; performing random mosaic stitching on the images with the probability of 30%; finally, enhanced image data I epsilon R is obtained H×W×3 Where H and W are the length and width, respectively, of the original image and 3 is the number of image channels.
Further, the specific steps of the step S20 are as follows:
s201, using the modified RepVGG-A0 as a backbone network, and only reserving C in the RepVGG-A0 3 and C4 The step length of the two feature layers is 8 and 16 respectively, and the number of the feature map channels is 96 and 192 respectively;
s202, adjusting the channel number of the input U-shaped feature fusion network, setting the channel number of the output feature layer as N, if C 3 Or C 4 The number of channels is not equal toThen its channel number is adjusted to +.1 convolution>Respectively marked as D 3 and D4 Otherwise, directly recorded as D 3 and D4 The method comprises the steps of carrying out a first treatment on the surface of the Three-branch grouping convolution block pair C consisting of 3×3 convolutions grouped into 4 and step size of 2, 1×1 convolutions grouped into 4 and step size of 2 and residual connections in parallel 4 Downsampling twice, denoted as D 5 and D6 The number of channels is ∈>
S203, feature D 6 Inputting the fast grouping residual error module to obtain P 6 ;
S204, using bilinear interpolation to P 6 Upsampling the feature map P 6 Amplified to the previous layer D 5 The same size, then D 5 And P 6 The characteristic graphs with the number of channels being N are obtained by connecting the characteristic graphs in series, and finally the characteristics after the series connection are input into the fast grouping residual error module in the step S203 to obtain P 5 ;
S205, pair D 4 and D3 Repeating the steps S203 to S204 to obtain P respectively 4 and P3 Finally only take P 3 As a final feature layer;
s206, P pair 3 Three-branch grouping convolution blocks with the number of N of two input and output channels are respectively used and are respectively marked as P 3' and P3 "; p pair P 3 ' generating classification branches by using a 1×1 convolution with the number of output channels C, C being the number of classes, to obtain P 3 Probability vector of each pixel point of each layerWherein i is P 3 Upper pixel abscissa, +.>j is P 3 Upper pixel ordinate,/->C is the class number, c= {1, how much, C; p pair P 3 "generating positioning branches using a 1×1 convolution with 4 output channels, yielding P 3 Position distance vector from each pixel point of layer to boundary frame +.>The method comprises the steps of carrying out a first treatment on the surface of the For P 3 Generating criticality branches using a 1 x 1 convolution with 1 output channel number to obtain P 3 Probability vector of criticality of each pixel point of layer +.>
Further, the specific construction steps of the fast packet residual module in step S203 are as follows:
s231, recording the characteristic data input into the fast grouping residual error module as F 0 Firstly, three 3X 3 convolutions with 4 groups, three branch grouping convolutions with 1X 1 convolutions with 4 groups and three branch grouping convolutions with parallel residual connection are continuously used for extracting features, the number of input channels and the number of output channels of a convolution layer are equal, and feature information after each time of passing through the three branch grouping convolutions is stored and is respectively recorded asAnd->
S232, adding a confusion module, wherein each layer of output characteristics is identical to the input characteristics F 0 Residual connection is carried out:
s233, feature information F 0 、And->4 layers are connected in series to obtain a specific input characteristic F 0 New feature F4 times greater than the number of channels 2 :
Wherein concat is a tandem operation;
s234, convolving the feature F with a 1X 1 convolution 2 The number of channels is reduced to F 0 The same size as F 0 Adding to obtain a feature layer F as local feature fusion 3 :
wherein ,is a 1 x 1 convolution, delta is a ReLU activation function, F 3 And the output result of the fast grouping residual error module is obtained.
Further, the specific steps of the step S30 are as follows:
s301, inputting the enhanced image into the target detection network, and performing characteristic image processing by using the single-characteristic anchor-frame-free target detection network based on the fast packet residual error module to obtain a predicted value of the image;
s302, generating positive sample points of targets in the image by the positive and negative sample optimization strategyWherein i is P 3 Upper pixel abscissa +.>j is P 3 Upper pixel ordinate +.>
S303, obtaining the classification Loss L by adopting a calculation mode in the Focal Loss cls Obtaining IoU Loss L by adopting an IoU Loss medium calculation mode iou Calculating a criticality loss L key The method comprises the steps of carrying out a first treatment on the surface of the The total loss L of the network is the sum of three branch losses:
L=L cls +L iou +L key
and S304, adjusting the parameters of the target detection network according to the loss result of the target detection network, executing the steps S301 to S303, and updating the parameters of the target detection network until the iteration times reach the preset times.
Further, the specific steps of the positive and negative sample optimization strategy in step S302 are as follows:
s401, record feature layer P 3 The coordinates of the upper pixel point are (i, j), for a pair of H W input images, K target points are assumed, each target point B k K is more than or equal to 1 and less than or equal to K, and comprises the upper left coordinate, the lower right coordinate and the labeling information of the target category, which is marked as +.> wherein And->Calculate eachThe target area, according to the target area from small to large, sequentially calculating the center point as positive sample point +.> wherein Placing the object with the smallest area at P 3 Applying;
s402, if a center point placed later collides with a positive sample point of the former, sequentially searching secondary advantages by taking a conflict point of a target with a larger area as a circle center according to the left-upper right-lower order, ensuring that the intersection ratio of an intersection pattern formed by a bounding box formed by the conflict point position after translating to the secondary advantages is larger than 0.7, otherwise, giving up marking the point;
s403, repeating the step S402 until all target marks are completed, wherein the rest unmarked detection points are negative sample points (i neg ,j neg )。
Further, the criticality loss L in step S303 key The calculation method is as follows: for any one falling into P 3 Target B of layer k According to the positive and negative sample distribution strategies in the steps S401-S433, the positive sample points are madeCritical weight +.>Remaining negative sample points (i) neg ,j neg ) Are all non-key points, after the boundary frame formed by the positions of the positive sample points is translated to the negative sample points, the intersection ratio of the graphs is calculated, namely the real value weight of the non-key point criticality is obtained, if different weights appear in the same detection point, the value with the highest weight is taken, and the point is matched with the point P 3 All points of the layer perform the above operation, and the criticality loss L is calculated by adopting binary cross entropy key Can be expressed as:
wherein ,p obtained after inputting the sample into the neural network 3 On-layer criticality branch predictor probability vectors.
Further, the specific steps of the step S50 are as follows:
s501, filling 0 into the edge of a 1 multiplied by 1 convolution kernel, and converting the 1 multiplied by 1 convolution kernel into a 3 multiplied by 3 convolution kernel;
s502, residual information of a corresponding channel is converted into a 1 multiplied by 1 convolution kernel, and then the 1 multiplied by 1 convolution kernel is converted into a 3 multiplied by 3 convolution kernel according to the step S51;
s503, adding the converted two 3X 3 convolution kernels and the original 3X 3 convolution kernels to form a new convolution kernel, and fusing each three-branch grouping convolution into a 3X 3 grouping convolution;
s504, inputting the image to be detected into the target detection network, and obtaining predicted probability vectors of each pixel point from the classification branch of the detection head wherein C is the category number; obtaining the position distance vector +.>Mapping to (0 with the ReLU function, + -infinity); obtaining a criticality probability vector +.>The value is then mapped onto (0, 1) using a sigmoid function;
wherein β and γ are regulatory factors;
s506, classification confidenceCalculating by using the prediction result of the critical degree branch corrected by the kernel function to obtain the confidence degree of each category of each corrected pixel point>
Wherein α is a modulator;
s507, performing primary screening by using a maximum pooling function with a convolution kernel size of 3, therebySelecting the first 100 predicted points with the confidence coefficient from large to small, filtering out points with the confidence coefficient lower than 0.05, calculating the distances from the predicted points to four sides by using the position distance vector, and removing redundant predicted frames by using a non-maximum value inhibition method; the retained targets and the corresponding categories and bounding boxes are the prediction results of the network on the input image.
In a second aspect, the invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a method as described above.
In a third aspect, the present invention further provides an electronic terminal, including: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the terminal executes the method.
Compared with the prior art, the single-feature anchor-frame-free target detection method based on the rapid grouping incomplete module has at least the following beneficial effects:
the method has simple flow and convenient operation, and the detection precision is effectively improved by adopting the U-shaped characteristic fusion network containing the fast grouping residual error module; the single feature diagram with the step length of 8 is used for prediction, the number of detection points is reduced by 4 times compared with that of the single feature diagram with the step length of 4, and a re-parameterization method is adopted for merging model parameters and improving the detection speed during reasoning; adopting a positive and negative sample optimization strategy to avoid the condition that the detection point sample marks are ambiguous due to the reduction of the feature layers; the influence of the low-quality non-key points of the edge on the detection effect is reduced by adopting the criticality branch; the method can effectively improve the detection precision, reduce the model parameter quantity and accelerate the detection speed, and has larger precision and speed improvement compared with the existing algorithm, and good application prospect in the field of target detection.
Drawings
In order to more clearly illustrate the solution of the invention, a brief description will be given below of the drawings required for the description of the embodiments, it being apparent that the drawings in the following description are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a single-feature anchor-frame-free target detection method based on a fast packet residual error module according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a U-shaped feature fusion network containing a fast packet residual module according to a single feature anchor-free frame target detection method based on the fast packet residual module according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a fast packet residual module according to a single-feature anchor-frame-free target detection method based on the fast packet residual module according to an embodiment of the present invention.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The invention provides a single-feature anchor-free frame target detection method based on a fast packet residual error module, which is applied to a target detection process, and comprises the following steps of:
s10, carrying out image enhancement on a training sample image;
s20, constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein the single-feature anchor-frame-free target detection network comprises a cut RepVGG network, a U-shaped feature fusion network containing the rapid packet residual error module and a detection head, and the detection head comprises a classification branch network, a position branch network and a criticality branch;
s30, inputting the training sample enhanced in the step S10 into a target detection network in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;
s40, marking positive and negative samples by adopting a positive and negative sample optimization strategy, respectively calculating a classification branch error, a position branch error and a criticality branch error according to the vector predicted in the step S30 and the real vector, and updating parameters of a target detection network based on the classification branch error, the position branch error and the criticality branch error until the iteration times reach preset times;
s50, taking the model obtained when the iteration times reach the preset times as parameters of a trained target detection network model, inputting the image to be detected into a target detection network to obtain a target category probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted category score by utilizing the criticality branches, and obtaining final target classification confidence and target positions.
The method has the advantages of simple flow and high detection precision, reduces the parameter number of the model, and improves the detection speed of the model.
In order to make the person skilled in the art better understand the solution of the present invention, the technical solution of the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings.
The data set used by the single-feature anchor-frame-free target detection method based on the fast packet residual error module is a combined set of VOC2007 and VOC2012, the target class is 20 types, the used platform is a Windows Server 2019 operating system, the CPU is Intel (R) Xeon (R) Gold 6226RCPU, the GPU is a block Nvidia GeForce RTX 3060, and the model of the embodiment is trained under a Pytorch 1.9 deep learning framework based on CUDA 11.3.109 and CUDNN 8.2.1.32 versions, the embodiment uses RepVGG to pretrain weight files on ImageNet to initialize a backbone network, adopts Adam to optimize the network, sets the batch number as 16, sets 260 generations in total, wherein the 1 st generation adopts a warmup technology, and the learning rate is 10 degrees -5 Gradually rise to 10 -3 After 100 generations, the learning rate is kept to be 10 -3 Gradually reducing learning rate by cosine annealing function until the 200 th generation is reduced to 10 -4 Finally, gradually reducing to 5 by adopting a cosine annealing function -6 Until the training is finished.
The invention provides a single-feature anchor-free frame target detection method based on a fast packet residual error module, which is applied to a target detection process and is combined with fig. 1 to 3, and in the embodiment, the single-feature anchor-free frame target detection method based on the fast packet residual error module comprises the following steps:
s10, image enhancement: image enhancement is carried out on the training sample picture, and the method comprises the following steps: randomly adjusting brightness, chromaticity and contrast; randomly cutting the image; randomly turning over the image left and right with 50% probability; randomly scaling the image with a 50% probability; performing random mosaic stitching on the images with the probability of 30%; finally, enhanced image data I epsilon R is obtained 512 ×512×3 。
S20, constructing a network: constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein a backbone network is a cut RepVGG network, a feature fusion network is a U-shaped feature fusion network containing the rapid packet residual error module, and detection heads are three branch networks which are a classification branch network, a position branch network and a criticality branch respectively;
specifically, in this embodiment, the specific steps of step S20 are as follows:
s201, using the modified RepVGG-A0 as a backbone network, and only reserving C in the RepVGG-A0 3 and C4 The step length of the two feature layers is 8 and 16 respectively, and the number of the feature map channels is 96 and 192 respectively;
s202, adjusting the number of channels of the input U-shaped feature fusion network; let the number of output feature layer channels be 128, if C 3 Or C 4 If the number of channels is not equal to 64, the number of channels is adjusted to 64 by using 1×1 convolution, and is respectively denoted as D 3 and D4 The method comprises the steps of carrying out a first treatment on the surface of the Three-branch grouping convolution block pair C consisting of 3×3 convolutions grouped into 4 and step size of 2, 1×1 convolutions grouped into 4 and step size of 2 and residual connections in parallel 4 Downsampling twice, denoted as D 5 and D6 The number of the channels is 64;
s203, feature D 6 Inputting the fast grouping residual error module to obtain P 6 ;
S204, using bilinear interpolation to P 6 Upsampling the feature map P 6 Amplified to the previous layer D 5 The same size, then D 5 And P 6 The feature images with 128 channels are obtained by series connection, and finally the features after series connection are input into the fast in the step S203Fast packet residual module gets P 5 ;
S205, pair D 4 and D3 Repeating steps S203 to S204 to obtain P respectively 4 and P3 Finally only take P 3 As a final feature layer;
s206, generating three network branches; p pair P 3 Three-branch grouping convolution blocks with 128 number of input and output channels are used respectively and are marked as P respectively 3' and P3 "; p pair P 3 ' generating classification branches using a 1×1 convolution with 20 output channels, yielding P 3 Probability vector of each pixel point of each layerP pair P 3 "generating positioning branches using a 1×1 convolution with 4 output channels, yielding P 3 Position distance vector from each pixel point of layer to boundary frame +.>P pair P 3 "generating criticality branches using a 1×1 convolution with 1 output channel number, yielding P 3 Probability vector of criticality of each pixel point of layer +.>
S30, training: inputting the training sample enhanced in the step S10 into a single-feature anchor-frame-free target detection network based on a rapid packet residual error module constructed in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;
specifically, in the present embodiment, the step of step S30 is as follows:
s301, inputting the enhanced image into a target detection network, and performing characteristic image processing by using a single-characteristic anchor-frame-free target detection network based on a fast packet residual error module to obtain a predicted value of the image;
s302, generating positive sample points of targets in images by positive and negative sample optimization strategiesWherein i is P 3 The upper pixel point abscissa i= {0,1, and (64) j is P 3 The upper pixel ordinate j= {0,1, carrying out the process of 64;
s303, calculating loss with the image predicted value: obtaining the classification Loss L by adopting a calculation mode in Focal Loss cls Obtaining IoU Loss L by adopting an IoU Loss medium calculation mode iou Calculating a criticality loss L key The method comprises the steps of carrying out a first treatment on the surface of the Calculating the sum of three branch losses by adopting a formula (6);
s304, adjusting network parameters of the target detection network according to the loss result of the target detection network, executing steps S301 to S303, carrying out back propagation, and updating the target detection network parameters until the iteration times reach the preset times.
S40, calculating loss: marking positive and negative samples by adopting a positive and negative sample optimization strategy; calculating a classification branch error according to the predicted class probability vector of each training sample and the real class probability vector of each training sample; calculating a position branch error according to the predicted position distance vector of each training sample and the real position distance vector of each training sample; using the formulaCalculating a supervised branch error according to the probability vector of the predicted criticality of each training sample and the probability vector of the true criticality of each training sample; based on the three errors, the target detection network parameters are reversely propagated and updated until the iteration times reach the preset times.
S50, reasoning: taking a model obtained when the iteration times reach the preset times as a parameter of a trained target detection network model, inputting an image to be detected into a target detection network to obtain a target category probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting a predicted category score by utilizing a criticality branch, and obtaining final target classification confidence and a target position;
specifically, in this embodiment, the specific steps of step S50 are as follows:
s501, filling 0 into the edge of a 1 multiplied by 1 convolution kernel, and converting the 1 multiplied by 1 convolution kernel into a 3 multiplied by 3 convolution kernel;
s502, for residual connection, residual information of a corresponding channel is converted into a 1×1 convolution kernel, and then the 1×1 convolution kernel is converted into a 3×3 convolution kernel according to the method in the step S501;
s503, adding the two converted 3X 3 convolution kernels and the original 3X 3 convolution kernels to form a new convolution kernel, so that each three-branch grouping convolution is fused to form a 3X 3 grouping convolution;
s504, inputting the image to be detected into a target detection network, and obtaining predicted probability vectors of each pixel point class from the classification branch of the detection headFrom the positioning branch of the detection head, the position distance vector of each pixel point to the boundary frame is obtained>Mapping to (0 with the ReLU function, + -infinity); obtaining a criticality probability vector +.>The value is then mapped onto (0, 1) using a sigmoid function;
s505, utilizing formulaThe prediction of critical branches by the kernel function T (x)>Correcting, namely taking beta=2 and gamma=1;
s506, utilizing formulaObtaining the confidence coefficient of each category of each corrected pixel pointTaking α=0.6;
s507, performing preliminary screening by using a maximum pooling function with a convolution kernel size of 3, and then performing secondary screeningSelecting the first 100 predicted points with the confidence coefficient from large to small, and filtering out points with the confidence coefficient lower than 0.05; finally, calculating the distances from the predicted point to the four sides by using the position distance vector, and removing redundant predicted frames by using a non-maximum value inhibition method; the retained targets and the corresponding categories and bounding boxes are the prediction results of the network on the input image.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements any of the methods of the embodiment.
The embodiment of the invention also provides an electronic terminal, which comprises: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the terminal executes any one of the methods in the embodiment.
The computer readable storage medium in this embodiment, as will be appreciated by those of ordinary skill in the art: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
The electronic terminal provided in this embodiment includes a processor, a memory, a transceiver, and a communication interface, where the memory and the communication interface are connected to the processor and the transceiver and complete communication with each other, the memory is used to store a computer program, the communication interface is used to perform communication, and the processor and the transceiver are used to run the computer program, so that the electronic terminal performs each step of the above method.
The method has simple flow and convenient operation, and the detection precision is effectively improved by adopting the U-shaped characteristic fusion network containing the fast grouping residual error module; the single feature diagram with the step length of 8 is used for prediction, the number of detection points is reduced by 4 times compared with that of the single feature diagram with the step length of 4, and a re-parameterization method is adopted for merging model parameters and improving the detection speed during reasoning; adopting a positive and negative sample optimization strategy to avoid the condition that the detection point sample marks are ambiguous due to the reduction of the feature layers; the influence of the low-quality non-key points of the edge on the detection effect is reduced by adopting the criticality branch; the method can effectively improve the detection precision, reduce the model parameter quantity and accelerate the detection speed, and has larger precision and speed improvement compared with the existing algorithm, and good application prospect in the field of target detection.
It is apparent that the above-described embodiments are merely preferred embodiments of the present invention, not all of which are shown in the drawings, which do not limit the scope of the invention. This invention may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the invention are directly or indirectly applied to other related technical fields, and are also within the scope of the invention.
Claims (10)
1. The single-feature anchor-frame-free target detection method based on the fast packet residual error module is characterized by comprising the following steps of:
s10, carrying out image enhancement on a training sample image;
s20, constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein the single-feature anchor-frame-free target detection network comprises a cut RepVGG network, a U-shaped feature fusion network containing the rapid packet residual error module and a detection head, and the detection head comprises a classification branch network, a position branch network and a criticality branch network;
s30, inputting the training samples enhanced in the step S10 into the target detection network in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;
s40, marking positive and negative samples by adopting a positive and negative sample optimization strategy, respectively calculating a classification branch error, a position branch error and a criticality branch error according to the vector predicted in the step S30 and the real vector, and updating parameters of the target detection network based on the classification branch error, the position branch error and the criticality branch error until the iteration times reach preset times;
s50, taking the model obtained when the iteration times reach the preset times as the trained parameters of the target detection network model, inputting the image to be detected into the target detection network to obtain a target class probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted class score by utilizing the criticality branches, and obtaining the final target classification confidence and the target position.
2. The method for single-feature anchor-free target detection based on the fast packet residual module according to claim 1, wherein in the step S10, the image enhancement includes randomly adjusting brightness, chromaticity and contrast; cutting the image; randomly turning over the image left and right with 50% probability; randomly scaling the image with 50% probability, wherein the scaling factor is not lower than 0.5 times of the original image, and the magnification factor is not higher than 3 times of the original image; performing random mosaic stitching on the images with the probability of 30%; finally, enhanced image data I epsilon R is obtained H×W×3 Where H and W are the length and width, respectively, of the original image and 3 is the number of image channels.
3. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 1, wherein the specific steps of the step S20 are as follows:
s201, using the modified RepVGG-A0 as a backbone network, and only reserving C in the RepVGG-A0 3 and C4 The step length of the two feature layers is 8 and 16 respectively, and the number of the feature map channels is 96 and 192 respectively;
s202, adjusting the channel number of the input U-shaped feature fusion network, setting the channel number of the output feature layer as N, if C 3 Or C 4 The number of channels is not equal toThen its channel number is adjusted to +.1 convolution>Respectively marked as D 3 and D4 Otherwise, directly recorded as D 3 and D4 The method comprises the steps of carrying out a first treatment on the surface of the Three-branch grouping convolution block pair C consisting of 3×3 convolutions grouped into 4 and step size of 2, 1×1 convolutions grouped into 4 and step size of 2 and residual connections in parallel 4 Downsampling twice, denoted as D 5 and D6 The number of channels is ∈>
S203, feature D 6 Inputting the fast grouping residual error module to obtain P 6 ;
S204, using bilinear interpolation to P 6 Upsampling the feature map P 6 Amplified to the previous layer D 5 The same size, then D 5 And P 6 The characteristic graphs with the number of channels being N are obtained by connecting the characteristic graphs in series, and finally the characteristics after the series connection are input into the fast grouping residual error module in the step S203 to obtain P 5 ;
S205, pair D 4 and D3 Repeating the steps S203 to S204 to obtain P respectively 4 and P3 Finally only take P 3 As a final feature layer;
s206, P pair 3 Three-branch grouping convolution blocks with the number of N of two input and output channels are respectively used and are respectively marked as P 3' and P3 "; p pair P 3 ' generating classification branches by using a 1×1 convolution with the number of output channels C, C being the number of classes, to obtain P 3 Probability vector of each pixel point of each layerWherein i is P 3 Upper pixel abscissa, +.>j is P 3 Upper pixel ordinate,/->C is the class number, c= {1, how much, C; generating a positioning branch for P3' by using a 1X 1 convolution with the output channel number of 4 to obtain P 3 Position distance vector from each pixel point of layer to boundary frame +.>P pair P 3 "generating criticality branches using a 1×1 convolution with 1 output channel number, yielding P 3 Probability vector of criticality of each pixel point of layer
4. The single-feature anchor-frame-free target detection method based on the fast packet residual module according to claim 3, wherein the specific construction steps of the fast packet residual module in the step S203 are as follows:
s231, recording the characteristic data input into the fast grouping residual error module as F 0 Firstly, three 3X 3 convolutions with 4 groups, three branch grouping convolutions with 1X 1 convolutions with 4 groups and three branch grouping convolutions with parallel residual connection are continuously used for extracting features, the number of input channels and the number of output channels of a convolution layer are equal, and feature information after each time of passing through the three branch grouping convolutions is stored and is respectively recorded asAnd->
S232, adding a confusion module, wherein each layer of output characteristics is identical to the input characteristics F 0 Residual connection is carried out:
s233, feature information F 0 、And->4 layers are connected in series to obtain a specific input characteristic F 0 Novel 4 times greater channel number feature
Wherein concat is a tandem operation;
s234, convolving the feature F with a 1X 1 convolution 2 The number of channels is reduced to F 0 The same size as F 0 Adding to obtain a feature layer F as local feature fusion 3 :
5. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 1, wherein the specific steps of the step S30 are as follows:
s301, inputting the enhanced image into the target detection network, and performing characteristic image processing by using the single-characteristic anchor-frame-free target detection network based on the fast packet residual error module to obtain a predicted value of the image;
s302, generating positive sample points of targets in the image by the positive and negative sample optimization strategyWherein i is P 3 Upper pixel abscissa +.>j is P 3 Upper pixel ordinate +.>
S303, obtaining the classification loss by adopting a calculation mode in FocalLossObtaining IoU Loss L by adopting an IoU Loss medium calculation mode iou Calculate the criticality loss->The total loss L of the network is the sum of three branch losses:
L=L cls +L iou +L key
and S304, adjusting the parameters of the target detection network according to the loss result of the target detection network, executing the steps S301 to S303, and updating the parameters of the target detection network until the iteration times reach the preset times.
6. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 5, wherein the positive and negative sample optimization strategy in step S302 specifically comprises the following steps:
s401, record feature layer P 3 The coordinates of the upper pixel point are (i, j), for a pair of H W input images, K target points are assumed, each target point B k K is more than or equal to 1 and less than or equal to K, and comprises the upper left coordinate, the lower right coordinate and the labeling information of the target category, which is marked as +.> whereinAnd->Calculating the target areas, and sequentially calculating the center point of the target areas from small to large as a positive sample point +.> wherein Placing the object with the smallest area at P 3 Applying;
s402, if a center point placed later collides with a positive sample point of the former, sequentially searching secondary advantages by taking a conflict point of a target with a larger area as a circle center according to the left-upper right-lower order, ensuring that the intersection ratio of an intersection pattern formed by a bounding box formed by the conflict point position after translating to the secondary advantages is larger than 0.7, otherwise, giving up marking the point;
s403, repeating the step S402 until all target marks are completed, wherein the rest unmarked detection points are negative sample points (i neg ,j neg )。
7. The method for single-feature anchor-free frame target detection based on the fast packet residual module according to claim 6, wherein the criticality loss L in step S303 key The calculation method is as follows: for any one falling into P 3 Target B of layer k According to the positive and negative sample distribution strategies in the steps S401-S433, the positive sample points are madeIs the criticality weight of (2)Remaining negative sample points (i) neg ,j neg ) All are non-key points, after the bounding box formed by the positive sample point position is translated to the negative sample point, the intersection ratio of the graph is calculated, namely the real value weight of the non-key point criticality,if the same detection point has different weights, the highest weight value is taken for P 3 All points of the layer perform the above operation, and the criticality loss L is calculated by adopting binary cross entropy key Can be expressed as:
8. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 1, wherein the specific steps of the step S50 are as follows:
s501, filling 0 into the edge of a 1 multiplied by 1 convolution kernel, and converting the 1 multiplied by 1 convolution kernel into a 3 multiplied by 3 convolution kernel;
s502, residual information of a corresponding channel is converted into a 1 multiplied by 1 convolution kernel, and then the 1 multiplied by 1 convolution kernel is converted into a 3 multiplied by 3 convolution kernel according to the step S51;
s503, adding the converted two 3X 3 convolution kernels and the original 3X 3 convolution kernels to form a new convolution kernel, and fusing each three-branch grouping convolution into a 3X 3 grouping convolution;
s504, inputting the image to be detected into the target detection network, and obtaining predicted probability vectors of each pixel point from the classification branch of the detection head wherein C isA category number; obtaining the position distance vector +.>Mapping to (0 with the ReLU function, + -infinity); obtaining a criticality probability vector +.>The value is then mapped onto (0, 1) using a sigmoid function;
wherein β and γ are regulatory factors;
s506, classification confidenceCalculating by using the prediction result of the critical degree branch corrected by the kernel function to obtain the confidence degree of each category of each corrected pixel point>
Wherein α is a modulator;
s507, performing primary screening by using a maximum pooling function with a convolution kernel size of 3, therebySelecting the first 100 predicted points with the confidence coefficient from large to small, filtering out points with the confidence coefficient lower than 0.05, calculating the distances from the predicted points to four sides by using the position distance vector, and removing redundant predicted frames by using a non-maximum value inhibition method; the retained targets and the corresponding categories and bounding boxes are the prediction results of the network on the input image.
9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
10. An electronic terminal, comprising: a processor and a memory;
the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the terminal to perform the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211693108.4A CN116091823A (en) | 2022-12-28 | 2022-12-28 | Single-feature anchor-frame-free target detection method based on fast grouping residual error module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211693108.4A CN116091823A (en) | 2022-12-28 | 2022-12-28 | Single-feature anchor-frame-free target detection method based on fast grouping residual error module |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116091823A true CN116091823A (en) | 2023-05-09 |
Family
ID=86203738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211693108.4A Pending CN116091823A (en) | 2022-12-28 | 2022-12-28 | Single-feature anchor-frame-free target detection method based on fast grouping residual error module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116091823A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116758295A (en) * | 2023-08-15 | 2023-09-15 | 摩尔线程智能科技(北京)有限责任公司 | Key point detection method and device, electronic equipment and storage medium |
CN117036241A (en) * | 2023-06-25 | 2023-11-10 | 深圳大学 | Deep learning-based prostate cancer whole body detection method and related device |
-
2022
- 2022-12-28 CN CN202211693108.4A patent/CN116091823A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117036241A (en) * | 2023-06-25 | 2023-11-10 | 深圳大学 | Deep learning-based prostate cancer whole body detection method and related device |
CN116758295A (en) * | 2023-08-15 | 2023-09-15 | 摩尔线程智能科技(北京)有限责任公司 | Key point detection method and device, electronic equipment and storage medium |
CN116758295B (en) * | 2023-08-15 | 2024-06-04 | 摩尔线程智能科技(北京)有限责任公司 | Key point detection method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335290B (en) | Twin candidate region generation network target tracking method based on attention mechanism | |
CN109859190B (en) | Target area detection method based on deep learning | |
WO2019120110A1 (en) | Image reconstruction method and device | |
CN110322453B (en) | 3D point cloud semantic segmentation method based on position attention and auxiliary network | |
CN109712165B (en) | Similar foreground image set segmentation method based on convolutional neural network | |
CN110059728B (en) | RGB-D image visual saliency detection method based on attention model | |
CN116091823A (en) | Single-feature anchor-frame-free target detection method based on fast grouping residual error module | |
CN113393457B (en) | Anchor-frame-free target detection method combining residual error dense block and position attention | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN114419413A (en) | Method for constructing sensing field self-adaptive transformer substation insulator defect detection neural network | |
CN110135446B (en) | Text detection method and computer storage medium | |
CN112581462A (en) | Method and device for detecting appearance defects of industrial products and storage medium | |
CN111461213A (en) | Training method of target detection model and target rapid detection method | |
CN112800955A (en) | Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid | |
CN111950389B (en) | Depth binary feature facial expression recognition method based on lightweight network | |
CN111798469A (en) | Digital image small data set semantic segmentation method based on deep convolutional neural network | |
CN114926722A (en) | Method and storage medium for detecting scale self-adaptive target based on YOLOv5 | |
CN115565043A (en) | Method for detecting target by combining multiple characteristic features and target prediction method | |
CN115937552A (en) | Image matching method based on fusion of manual features and depth features | |
CN112967296B (en) | Point cloud dynamic region graph convolution method, classification method and segmentation method | |
Zhang et al. | A small target detection algorithm based on improved YOLOv5 in aerial image | |
CN113627481A (en) | Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens | |
CN117151998A (en) | Image illumination correction method based on support vector regression | |
CN109583584B (en) | Method and system for enabling CNN with full connection layer to accept indefinite shape input | |
CN116778182A (en) | Sketch work grading method and sketch work grading model based on multi-scale feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |