CN116091823A

CN116091823A - Single-feature anchor-frame-free target detection method based on fast grouping residual error module

Info

Publication number: CN116091823A
Application number: CN202211693108.4A
Authority: CN
Inventors: 陶家俊; 刘伟; 胡为; 李小智
Original assignee: Hunan University of Chinese Medicine
Current assignee: Hunan University of Chinese Medicine
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-05-09

Abstract

The invention is suitable for the technical field of computer vision deep learning, and relates to a single-feature anchor-frame-free target detection method based on a fast grouping residual error module, which comprises the following steps: s10, enhancing an image; s20, constructing a target detection network; s30, predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality; s40, calculating a classification branch error, a position branch error and a criticality branch error, and updating parameters of a target detection network until the iteration times reach preset times; s50, inputting the image to be detected into a model obtained when the iteration times reach the preset times, obtaining a target category probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted category score by utilizing the criticality branches, and obtaining the final target classification confidence and the target position. The method has the advantages of simple flow and high detection precision, reduces the parameter number of the model, and improves the detection speed of the model.

Description

Single-feature anchor-frame-free target detection method based on fast grouping residual error module

Technical Field

The invention belongs to the technical field of computer vision deep learning, and particularly relates to a single-feature anchor-frame-free target detection method based on a fast grouping residual error module.

Background

Today, artificial intelligence technology is rapidly changing our world. As a dual-vision of the intelligent world, computer vision is a major branch of artificial intelligence technology by capturing digital images, video or other visual input signals and training with a deep learning model so that a computer can accurately identify and classify and then react to what it "sees". The object detection is used for positioning the position of an object frame in an image and identifying the category to which the object belongs, and is widely applied to downstream tasks such as scientific research, actual production, life and the like.

In order to improve the detection effect, the dense point prediction method represented by FCOS and FoveaBox uses a re-weighting method to improve the detection quality, and uses a FPN dividing and controlling solution, namely, manually designating different scales to enable targets to fall into corresponding feature maps to help a detector to separate overlapped targets, wherein the targets are designated to a certain layer for learning during training, which is basically another form of 'anchor frame', and each detection point of each layer has a hidden square anchor frame with fixed size. Since each feature layer needs to have a corresponding detection head, a large number of parameters result in slow detection. The patent with the publication number of CN112818964A provides an unmanned aerial vehicle detection method based on a FoveaBox anchor-free neural network, which comprises the steps of firstly setting initial parameters of a FoveaBox neural network model, inputting training set images in an unmanned aerial vehicle database into the set FoveaBox neural network model for training, and obtaining the unmanned aerial vehicle detection model based on deep learning; inputting an unmanned aerial vehicle image to be detected into a trained unmanned aerial vehicle detection model in the step one, and obtaining a characteristic diagram of the possibility of a multi-layer prediction target; and then processing the output characteristic diagram of the main network through the position sub-network, classifying the output characteristic diagram of the main network pixel by combining with the detection head sub-network, and directly detecting and acquiring the target type and the position information. The patent also adopts the foveaBox non-anchor neural network commonly used in the prior art, and the application range and the universality are ensured, but the defect still exists in the detection speed.

Therefore, how to provide a target detection method with high detection speed while ensuring the detection accuracy of the target is a problem to be solved by those skilled in the art.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a single-feature anchor-frame-free target detection method based on a fast packet residual error module so as to solve the problem of low detection speed of the target detection method in the prior art; in addition, the invention also provides a single-feature anchor-frame-free target detection medium based on the fast packet residual error module and a terminal.

In order to solve the technical problems, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a single-feature anchor-frame-free target detection method based on a fast packet residual module, including the steps of:

s10, carrying out image enhancement on a training sample image;

s20, constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein the single-feature anchor-frame-free target detection network comprises a cut RepVGG network, a U-shaped feature fusion network containing the rapid packet residual error module and a detection head, and the detection head comprises a classification branch network, a position branch network and a criticality branch network;

s30, inputting the training samples enhanced in the step S10 into the target detection network in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;

s40, marking positive and negative samples by adopting a positive and negative sample optimization strategy, respectively calculating a classification branch error, a position branch error and a criticality branch error according to the vector predicted in the step S30 and the real vector, and updating parameters of the target detection network based on the classification branch error, the position branch error and the criticality branch error until the iteration times reach preset times;

s50, taking the model obtained when the iteration times reach the preset times as the trained parameters of the target detection network model, inputting the image to be detected into the target detection network to obtain a target class probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted class score by utilizing the criticality branches, and obtaining the final target classification confidence and the target position.

Further, in the step S10, the image enhancement includes randomly adjusting brightness, chromaticity and contrast; cutting the image; randomly turning over the image left and right with 50% probability; randomly scaling the image with 50% probability, wherein the scaling factor is not lower than 0.5 times of the original image, and the magnification factor is not higher than 3 times of the original image; performing random mosaic stitching on the images with the probability of 30%; finally, enhanced image data I epsilon R is obtained ^H×W×3 Where H and W are the length and width, respectively, of the original image and 3 is the number of image channels.

Further, the specific steps of the step S20 are as follows:

s201, using the modified RepVGG-A0 as a backbone network, and only reserving C in the RepVGG-A0 ₃ and C₄ The step length of the two feature layers is 8 and 16 respectively, and the number of the feature map channels is 96 and 192 respectively;

s202, adjusting the channel number of the input U-shaped feature fusion network, setting the channel number of the output feature layer as N, if C ₃ Or C ₄ The number of channels is not equal to

Then its channel number is adjusted to +.1 convolution>

Respectively marked as D ₃ and D₄ Otherwise, directly recorded as D ₃ and D₄ The method comprises the steps of carrying out a first treatment on the surface of the Three-branch grouping convolution block pair C consisting of 3×3 convolutions grouped into 4 and step size of 2, 1×1 convolutions grouped into 4 and step size of 2 and residual connections in parallel ₄ Downsampling twice, denoted as D ₅ and D₆ The number of channels is ∈>

S203, feature D ₆ Inputting the fast grouping residual error module to obtain P ₆ ；

S204, using bilinear interpolation to P ₆ Upsampling the feature map P ₆ Amplified to the previous layer D ₅ The same size, then D ₅ And P ₆ The characteristic graphs with the number of channels being N are obtained by connecting the characteristic graphs in series, and finally the characteristics after the series connection are input into the fast grouping residual error module in the step S203 to obtain P ₅ ；

S205, pair D ₄ and D₃ Repeating the steps S203 to S204 to obtain P respectively ₄ and P₃ Finally only take P ₃ As a final feature layer;

s206, P pair ₃ Three-branch grouping convolution blocks with the number of N of two input and output channels are respectively used and are respectively marked as P ₃' and P₃ "; p pair P ₃ ' generating classification branches by using a 1×1 convolution with the number of output channels C, C being the number of classes, to obtain P ₃ Probability vector of each pixel point of each layer

Wherein i is P ₃ Upper pixel abscissa, +.>

j is P ₃ Upper pixel ordinate,/->

C is the class number, c= {1, how much, C; p pair P ₃ "generating positioning branches using a 1×1 convolution with 4 output channels, yielding P ₃ Position distance vector from each pixel point of layer to boundary frame +.>

The method comprises the steps of carrying out a first treatment on the surface of the For P ₃ Generating criticality branches using a 1 x 1 convolution with 1 output channel number to obtain P ₃ Probability vector of criticality of each pixel point of layer +.>

Further, the specific construction steps of the fast packet residual module in step S203 are as follows:

s231, recording the characteristic data input into the fast grouping residual error module as F ₀ Firstly, three 3X 3 convolutions with 4 groups, three branch grouping convolutions with 1X 1 convolutions with 4 groups and three branch grouping convolutions with parallel residual connection are continuously used for extracting features, the number of input channels and the number of output channels of a convolution layer are equal, and feature information after each time of passing through the three branch grouping convolutions is stored and is respectively recorded as

And->

S232, adding a confusion module, wherein each layer of output characteristics is identical to the input characteristics F ₀ Residual connection is carried out:

wherein ,

is a 3 x 3 convolution>

Is a 1 x 1 convolution, delta is a ReLU activation function;

s233, feature information F ₀ 、

And->

4 layers are connected in series to obtain a specific input characteristic F ₀ New feature F4 times greater than the number of channels ₂ ：

Wherein concat is a tandem operation;

s234, convolving the feature F with a 1X 1 convolution ₂ The number of channels is reduced to F ₀ The same size as F ₀ Adding to obtain a feature layer F as local feature fusion ₃ ：

wherein ,

is a 1 x 1 convolution, delta is a ReLU activation function, F ₃ And the output result of the fast grouping residual error module is obtained.

Further, the specific steps of the step S30 are as follows:

s301, inputting the enhanced image into the target detection network, and performing characteristic image processing by using the single-characteristic anchor-frame-free target detection network based on the fast packet residual error module to obtain a predicted value of the image;

s302, generating positive sample points of targets in the image by the positive and negative sample optimization strategy

Wherein i is P ₃ Upper pixel abscissa +.>

j is P ₃ Upper pixel ordinate +.>

S303, obtaining the classification Loss L by adopting a calculation mode in the Focal Loss _cls Obtaining IoU Loss L by adopting an IoU Loss medium calculation mode _iou Calculating a criticality loss L _key The method comprises the steps of carrying out a first treatment on the surface of the The total loss L of the network is the sum of three branch losses:

L＝L _cls +L _iou +L _key

and S304, adjusting the parameters of the target detection network according to the loss result of the target detection network, executing the steps S301 to S303, and updating the parameters of the target detection network until the iteration times reach the preset times.

Further, the specific steps of the positive and negative sample optimization strategy in step S302 are as follows:

s401, record feature layer P ₃ The coordinates of the upper pixel point are (i, j),

for a pair of H W input images, K target points are assumed, each target point B ^k K is more than or equal to 1 and less than or equal to K, and comprises the upper left coordinate, the lower right coordinate and the labeling information of the target category, which is marked as +.>

wherein

And->

Calculate eachThe target area, according to the target area from small to large, sequentially calculating the center point as positive sample point +.>

wherein

Placing the object with the smallest area at P ₃ Applying;

s402, if a center point placed later collides with a positive sample point of the former, sequentially searching secondary advantages by taking a conflict point of a target with a larger area as a circle center according to the left-upper right-lower order, ensuring that the intersection ratio of an intersection pattern formed by a bounding box formed by the conflict point position after translating to the secondary advantages is larger than 0.7, otherwise, giving up marking the point;

s403, repeating the step S402 until all target marks are completed, wherein the rest unmarked detection points are negative sample points (i _neg ,j _neg )。

Further, the criticality loss L in step S303 _key The calculation method is as follows: for any one falling into P ₃ Target B of layer ^k According to the positive and negative sample distribution strategies in the steps S401-S433, the positive sample points are made

Critical weight +.>

Remaining negative sample points (i) _neg ,j _neg ) Are all non-key points, after the boundary frame formed by the positions of the positive sample points is translated to the negative sample points, the intersection ratio of the graphs is calculated, namely the real value weight of the non-key point criticality is obtained, if different weights appear in the same detection point, the value with the highest weight is taken, and the point is matched with the point P ₃ All points of the layer perform the above operation, and the criticality loss L is calculated by adopting binary cross entropy _key Can be expressed as:

wherein ,

p obtained after inputting the sample into the neural network ₃ On-layer criticality branch predictor probability vectors.

Further, the specific steps of the step S50 are as follows:

s501, filling 0 into the edge of a 1 multiplied by 1 convolution kernel, and converting the 1 multiplied by 1 convolution kernel into a 3 multiplied by 3 convolution kernel;

s502, residual information of a corresponding channel is converted into a 1 multiplied by 1 convolution kernel, and then the 1 multiplied by 1 convolution kernel is converted into a 3 multiplied by 3 convolution kernel according to the step S51;

s503, adding the converted two 3X 3 convolution kernels and the original 3X 3 convolution kernels to form a new convolution kernel, and fusing each three-branch grouping convolution into a 3X 3 grouping convolution;

s504, inputting the image to be detected into the target detection network, and obtaining predicted probability vectors of each pixel point from the classification branch of the detection head

wherein

C is the category number; obtaining the position distance vector +.>

Mapping to (0 with the ReLU function, + -infinity); obtaining a criticality probability vector +.>

The value is then mapped onto (0, 1) using a sigmoid function;

s505, predicting result of critical branch by using kernel function T (x)

And (3) correcting:

wherein β and γ are regulatory factors;

s506, classification confidence

Calculating by using the prediction result of the critical degree branch corrected by the kernel function to obtain the confidence degree of each category of each corrected pixel point>

Wherein α is a modulator;

s507, performing primary screening by using a maximum pooling function with a convolution kernel size of 3, thereby

Selecting the first 100 predicted points with the confidence coefficient from large to small, filtering out points with the confidence coefficient lower than 0.05, calculating the distances from the predicted points to four sides by using the position distance vector, and removing redundant predicted frames by using a non-maximum value inhibition method; the retained targets and the corresponding categories and bounding boxes are the prediction results of the network on the input image.

In a second aspect, the invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a method as described above.

In a third aspect, the present invention further provides an electronic terminal, including: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the terminal executes the method.

Compared with the prior art, the single-feature anchor-frame-free target detection method based on the rapid grouping incomplete module has at least the following beneficial effects:

the method has simple flow and convenient operation, and the detection precision is effectively improved by adopting the U-shaped characteristic fusion network containing the fast grouping residual error module; the single feature diagram with the step length of 8 is used for prediction, the number of detection points is reduced by 4 times compared with that of the single feature diagram with the step length of 4, and a re-parameterization method is adopted for merging model parameters and improving the detection speed during reasoning; adopting a positive and negative sample optimization strategy to avoid the condition that the detection point sample marks are ambiguous due to the reduction of the feature layers; the influence of the low-quality non-key points of the edge on the detection effect is reduced by adopting the criticality branch; the method can effectively improve the detection precision, reduce the model parameter quantity and accelerate the detection speed, and has larger precision and speed improvement compared with the existing algorithm, and good application prospect in the field of target detection.

Drawings

In order to more clearly illustrate the solution of the invention, a brief description will be given below of the drawings required for the description of the embodiments, it being apparent that the drawings in the following description are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a single-feature anchor-frame-free target detection method based on a fast packet residual error module according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a U-shaped feature fusion network containing a fast packet residual module according to a single feature anchor-free frame target detection method based on the fast packet residual module according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a fast packet residual module according to a single-feature anchor-frame-free target detection method based on the fast packet residual module according to an embodiment of the present invention.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The invention provides a single-feature anchor-free frame target detection method based on a fast packet residual error module, which is applied to a target detection process, and comprises the following steps of:

s10, carrying out image enhancement on a training sample image;

s20, constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein the single-feature anchor-frame-free target detection network comprises a cut RepVGG network, a U-shaped feature fusion network containing the rapid packet residual error module and a detection head, and the detection head comprises a classification branch network, a position branch network and a criticality branch;

s30, inputting the training sample enhanced in the step S10 into a target detection network in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;

s40, marking positive and negative samples by adopting a positive and negative sample optimization strategy, respectively calculating a classification branch error, a position branch error and a criticality branch error according to the vector predicted in the step S30 and the real vector, and updating parameters of a target detection network based on the classification branch error, the position branch error and the criticality branch error until the iteration times reach preset times;

s50, taking the model obtained when the iteration times reach the preset times as parameters of a trained target detection network model, inputting the image to be detected into a target detection network to obtain a target category probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting the predicted category score by utilizing the criticality branches, and obtaining final target classification confidence and target positions.

The method has the advantages of simple flow and high detection precision, reduces the parameter number of the model, and improves the detection speed of the model.

In order to make the person skilled in the art better understand the solution of the present invention, the technical solution of the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings.

The data set used by the single-feature anchor-frame-free target detection method based on the fast packet residual error module is a combined set of VOC2007 and VOC2012, the target class is 20 types, the used platform is a Windows Server 2019 operating system, the CPU is Intel (R) Xeon (R) Gold 6226RCPU, the GPU is a block Nvidia GeForce RTX 3060, and the model of the embodiment is trained under a Pytorch 1.9 deep learning framework based on CUDA 11.3.109 and CUDNN 8.2.1.32 versions, the embodiment uses RepVGG to pretrain weight files on ImageNet to initialize a backbone network, adopts Adam to optimize the network, sets the batch number as 16, sets 260 generations in total, wherein the 1 st generation adopts a warmup technology, and the learning rate is 10 degrees ^-5 Gradually rise to 10 ^-3 After 100 generations, the learning rate is kept to be 10 ^-3 Gradually reducing learning rate by cosine annealing function until the 200 th generation is reduced to 10 ^-4 Finally, gradually reducing to 5 by adopting a cosine annealing function ^-6 Until the training is finished.

The invention provides a single-feature anchor-free frame target detection method based on a fast packet residual error module, which is applied to a target detection process and is combined with fig. 1 to 3, and in the embodiment, the single-feature anchor-free frame target detection method based on the fast packet residual error module comprises the following steps:

s10, image enhancement: image enhancement is carried out on the training sample picture, and the method comprises the following steps: randomly adjusting brightness, chromaticity and contrast; randomly cutting the image; randomly turning over the image left and right with 50% probability; randomly scaling the image with a 50% probability; performing random mosaic stitching on the images with the probability of 30%; finally, enhanced image data I epsilon R is obtained ⁵¹² ^×512×3 。

S20, constructing a network: constructing a single-feature anchor-frame-free target detection network based on a rapid packet residual error module, wherein a backbone network is a cut RepVGG network, a feature fusion network is a U-shaped feature fusion network containing the rapid packet residual error module, and detection heads are three branch networks which are a classification branch network, a position branch network and a criticality branch respectively;

specifically, in this embodiment, the specific steps of step S20 are as follows:

s202, adjusting the number of channels of the input U-shaped feature fusion network; let the number of output feature layer channels be 128, if C ₃ Or C ₄ If the number of channels is not equal to 64, the number of channels is adjusted to 64 by using 1×1 convolution, and is respectively denoted as D ₃ and D₄ The method comprises the steps of carrying out a first treatment on the surface of the Three-branch grouping convolution block pair C consisting of 3×3 convolutions grouped into 4 and step size of 2, 1×1 convolutions grouped into 4 and step size of 2 and residual connections in parallel ₄ Downsampling twice, denoted as D ₅ and D₆ The number of the channels is 64;

S204, using bilinear interpolation to P ₆ Upsampling the feature map P ₆ Amplified to the previous layer D ₅ The same size, then D ₅ And P ₆ The feature images with 128 channels are obtained by series connection, and finally the features after series connection are input into the fast in the step S203Fast packet residual module gets P ₅ ；

S205, pair D ₄ and D₃ Repeating steps S203 to S204 to obtain P respectively ₄ and P₃ Finally only take P ₃ As a final feature layer;

s206, generating three network branches; p pair P ₃ Three-branch grouping convolution blocks with 128 number of input and output channels are used respectively and are marked as P respectively ₃' and P₃ "; p pair P ₃ ' generating classification branches using a 1×1 convolution with 20 output channels, yielding P ₃ Probability vector of each pixel point of each layer

P pair P ₃ "generating positioning branches using a 1×1 convolution with 4 output channels, yielding P ₃ Position distance vector from each pixel point of layer to boundary frame +.>

P pair P ₃ "generating criticality branches using a 1×1 convolution with 1 output channel number, yielding P ₃ Probability vector of criticality of each pixel point of layer +.>

S30, training: inputting the training sample enhanced in the step S10 into a single-feature anchor-frame-free target detection network based on a rapid packet residual error module constructed in the step S20, and predicting to obtain a target category probability vector, a target position distance vector and a probability vector of target criticality;

specifically, in the present embodiment, the step of step S30 is as follows:

s301, inputting the enhanced image into a target detection network, and performing characteristic image processing by using a single-characteristic anchor-frame-free target detection network based on a fast packet residual error module to obtain a predicted value of the image;

s302, generating positive sample points of targets in images by positive and negative sample optimization strategies

Wherein i is P ₃ The upper pixel point abscissa i= {0,1, and (64) j is P ₃ The upper pixel ordinate j= {0,1, carrying out the process of 64;

s303, calculating loss with the image predicted value: obtaining the classification Loss L by adopting a calculation mode in Focal Loss _cls Obtaining IoU Loss L by adopting an IoU Loss medium calculation mode _iou Calculating a criticality loss L _key The method comprises the steps of carrying out a first treatment on the surface of the Calculating the sum of three branch losses by adopting a formula (6);

s304, adjusting network parameters of the target detection network according to the loss result of the target detection network, executing steps S301 to S303, carrying out back propagation, and updating the target detection network parameters until the iteration times reach the preset times.

S40, calculating loss: marking positive and negative samples by adopting a positive and negative sample optimization strategy; calculating a classification branch error according to the predicted class probability vector of each training sample and the real class probability vector of each training sample; calculating a position branch error according to the predicted position distance vector of each training sample and the real position distance vector of each training sample; using the formula

Calculating a supervised branch error according to the probability vector of the predicted criticality of each training sample and the probability vector of the true criticality of each training sample; based on the three errors, the target detection network parameters are reversely propagated and updated until the iteration times reach the preset times.

S50, reasoning: taking a model obtained when the iteration times reach the preset times as a parameter of a trained target detection network model, inputting an image to be detected into a target detection network to obtain a target category probability vector, a target position distance vector and a target criticality probability vector of the image, adjusting a predicted category score by utilizing a criticality branch, and obtaining final target classification confidence and a target position;

specifically, in this embodiment, the specific steps of step S50 are as follows:

s502, for residual connection, residual information of a corresponding channel is converted into a 1×1 convolution kernel, and then the 1×1 convolution kernel is converted into a 3×3 convolution kernel according to the method in the step S501;

s503, adding the two converted 3X 3 convolution kernels and the original 3X 3 convolution kernels to form a new convolution kernel, so that each three-branch grouping convolution is fused to form a 3X 3 grouping convolution;

s504, inputting the image to be detected into a target detection network, and obtaining predicted probability vectors of each pixel point class from the classification branch of the detection head

From the positioning branch of the detection head, the position distance vector of each pixel point to the boundary frame is obtained>

The value is then mapped onto (0, 1) using a sigmoid function;

s505, utilizing formula

The prediction of critical branches by the kernel function T (x)>

Correcting, namely taking beta=2 and gamma=1;

s506, utilizing formula

Obtaining the confidence coefficient of each category of each corrected pixel point

Taking α=0.6;

s507, performing preliminary screening by using a maximum pooling function with a convolution kernel size of 3, and then performing secondary screening

Selecting the first 100 predicted points with the confidence coefficient from large to small, and filtering out points with the confidence coefficient lower than 0.05; finally, calculating the distances from the predicted point to the four sides by using the position distance vector, and removing redundant predicted frames by using a non-maximum value inhibition method; the retained targets and the corresponding categories and bounding boxes are the prediction results of the network on the input image.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements any of the methods of the embodiment.

The embodiment of the invention also provides an electronic terminal, which comprises: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the terminal executes any one of the methods in the embodiment.

The computer readable storage medium in this embodiment, as will be appreciated by those of ordinary skill in the art: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

The electronic terminal provided in this embodiment includes a processor, a memory, a transceiver, and a communication interface, where the memory and the communication interface are connected to the processor and the transceiver and complete communication with each other, the memory is used to store a computer program, the communication interface is used to perform communication, and the processor and the transceiver are used to run the computer program, so that the electronic terminal performs each step of the above method.

It is apparent that the above-described embodiments are merely preferred embodiments of the present invention, not all of which are shown in the drawings, which do not limit the scope of the invention. This invention may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the invention are directly or indirectly applied to other related technical fields, and are also within the scope of the invention.

Claims

1. The single-feature anchor-frame-free target detection method based on the fast packet residual error module is characterized by comprising the following steps of:

s10, carrying out image enhancement on a training sample image;

2. The method for single-feature anchor-free target detection based on the fast packet residual module according to claim 1, wherein in the step S10, the image enhancement includes randomly adjusting brightness, chromaticity and contrast; cutting the image; randomly turning over the image left and right with 50% probability; randomly scaling the image with 50% probability, wherein the scaling factor is not lower than 0.5 times of the original image, and the magnification factor is not higher than 3 times of the original image; performing random mosaic stitching on the images with the probability of 30%; finally, enhanced image data I epsilon R is obtained ^H×W×3 Where H and W are the length and width, respectively, of the original image and 3 is the number of image channels.

3. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 1, wherein the specific steps of the step S20 are as follows:

Then its channel number is adjusted to +.1 convolution>

Wherein i is P ₃ Upper pixel abscissa, +.>

j is P ₃ Upper pixel ordinate,/->

C is the class number, c= {1, how much, C; generating a positioning branch for P3' by using a 1X 1 convolution with the output channel number of 4 to obtain P ₃ Position distance vector from each pixel point of layer to boundary frame +.>

P pair P ₃ "generating criticality branches using a 1×1 convolution with 1 output channel number, yielding P ₃ Probability vector of criticality of each pixel point of layer

4. The single-feature anchor-frame-free target detection method based on the fast packet residual module according to claim 3, wherein the specific construction steps of the fast packet residual module in the step S203 are as follows:

And->

wherein ,

is a 3 x 3 convolution>

Is a 1 x 1 convolution, delta is a ReLU activation function;

s233, feature information F ₀ 、

And->

4 layers are connected in series to obtain a specific input characteristic F ₀ Novel 4 times greater channel number feature

Wherein concat is a tandem operation;

wherein ,

5. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 1, wherein the specific steps of the step S30 are as follows:

Wherein i is P ₃ Upper pixel abscissa +.>

j is P ₃ Upper pixel ordinate +.>

S303, obtaining the classification loss by adopting a calculation mode in FocalLoss

Obtaining IoU Loss L by adopting an IoU Loss medium calculation mode _iou Calculate the criticality loss->

The total loss L of the network is the sum of three branch losses:

L＝L _cls +L _iou +L _key

6. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 5, wherein the positive and negative sample optimization strategy in step S302 specifically comprises the following steps:

wherein

And->

Calculating the target areas, and sequentially calculating the center point of the target areas from small to large as a positive sample point +.>

wherein

Placing the object with the smallest area at P ₃ Applying;

7. The method for single-feature anchor-free frame target detection based on the fast packet residual module according to claim 6, wherein the criticality loss L in step S303 _key The calculation method is as follows: for any one falling into P ₃ Target B of layer ^k According to the positive and negative sample distribution strategies in the steps S401-S433, the positive sample points are made

Is the criticality weight of (2)

Remaining negative sample points (i) _neg ,j _neg ) All are non-key points, after the bounding box formed by the positive sample point position is translated to the negative sample point, the intersection ratio of the graph is calculated, namely the real value weight of the non-key point criticality,if the same detection point has different weights, the highest weight value is taken for P ₃ All points of the layer perform the above operation, and the criticality loss L is calculated by adopting binary cross entropy _key Can be expressed as:

wherein ,

8. The single-feature anchor-free frame target detection method based on the fast packet residual module according to claim 1, wherein the specific steps of the step S50 are as follows:

wherein

C isA category number; obtaining the position distance vector +.>

The value is then mapped onto (0, 1) using a sigmoid function;

s505, predicting result of critical branch by using kernel function T (x)

And (3) correcting:

wherein β and γ are regulatory factors;

s506, classification confidence

Wherein α is a modulator;

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.

10. An electronic terminal, comprising: a processor and a memory;

the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the terminal to perform the method according to any one of claims 1 to 8.