CN115311553A

CN115311553A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN115311553A
Application number: CN202210819568.0A
Authority: CN
Inventors: 刘文龙; 曾卓熙; 肖嵘; 王孝宇
Original assignee: Qingdao Yuntian Lifei Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Qingdao Yuntian Lifei Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2022-11-08
Also published as: WO2024011873A1

Abstract

The embodiment of the invention provides a target detection method, which comprises the following steps: acquiring an image to be detected, wherein the image to be detected comprises a rotating target to be detected; performing feature extraction on the image to be detected through a trained target detection model to obtain classification features and auxiliary features of the rotating target to be detected; and carrying out auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the to-be-detected rotating target. The classification features and the auxiliary features of the rotary target to be detected are extracted, and the classification features are subjected to auxiliary processing by utilizing the auxiliary features, so that the detection result of the rotary target to be detected is obtained, and the anchor is not required to be designed, so that non-maximum suppression is not required, the detection accuracy of the rotary target detection is improved, and the detection performance of the rotary target detection is further improved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a target detection method, a target detection device, electronic equipment and a storage medium.

Background

The rotation target detection means that a target with a rotation direction is detected, namely, a central point, a width and a height of the target are required to be detected, and an angle of the target is required to be detected. The method is common in target detection of top views, such as remote sensing image target detection, aerial image target detection and the like. The current rotating target detection is usually based on anchor point and non-maximum suppression, because the size and position of the rotating target in the image are different, the anchor-based target detection algorithm has some inherent defects, for example, the rotating target with all sizes and positions is required to be detected, the design of the anchor is very complicated, different proportions and different sizes are required to be designed, and the proportion and size of the anchor are not designed properly, and the subsequent non-maximum suppression is influenced, so that the error accumulation is caused, and the detection accuracy of the target detection model is influenced. Therefore, the existing rotating target detection algorithm has the problem of low detection accuracy.

Disclosure of Invention

The embodiment of the invention provides a target detection method, aiming at solving the problem that a rotating target detection algorithm has low detection accuracy in the existing target detection process. The classification features and the auxiliary features of the rotary target to be detected are extracted, and the classification features are subjected to auxiliary processing by utilizing the auxiliary features, so that the detection result of the rotary target to be detected is obtained, and the anchor is not required to be designed, so that non-maximum suppression is not required, the detection accuracy of the rotary target detection is improved, and the detection performance of the rotary target detection is further improved.

In a first aspect, an embodiment of the present invention provides an object detection method, where the object detection method is used for detecting a rotating object, and the method includes:

acquiring an image to be detected, wherein the image to be detected comprises a rotating target to be detected;

performing feature extraction on the image to be detected through a trained target detection model to obtain classification features and auxiliary features of the rotary target to be detected;

and carrying out auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the to-be-detected rotating target.

Optionally, the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network, and the trained target detection model performs feature extraction on the image to be detected to obtain the classification feature and the auxiliary feature of the to-be-detected rotating target, including:

extracting the features of the image to be detected through the feature extraction network to obtain the multi-scale features of the image to be detected;

performing feature fusion on the multi-scale features through the feature fusion network to obtain fusion features of the image to be detected;

predicting the fusion characteristics through the classification characteristic output network to obtain the classification characteristics of the to-be-detected rotating targets, wherein the classification characteristics comprise characteristic channels, and the to-be-detected rotating targets of different classes correspond to different characteristic channels;

and predicting the fusion characteristics through the auxiliary characteristic output network to obtain the auxiliary characteristics of the to-be-detected rotating target.

Optionally, the auxiliary features include a height-width feature and a rotation angle feature, and the performing auxiliary processing on the classification feature based on the auxiliary features to obtain a detection result of the to-be-detected rotating target includes:

extracting key points of the classification features to obtain target key points of the to-be-detected rotating target;

based on the target key points, indexing a corresponding target height and width attribute in the height and width features and indexing a corresponding target rotation angle attribute in the rotation angle features;

and obtaining a detection result of the to-be-detected rotating target based on the target key point, the target height and width attribute and the target rotating angle attribute.

Optionally, before the trained target detection model performs feature extraction on the image to be detected to obtain the classification feature and the auxiliary feature of the to-be-detected rotating target, the method further includes:

acquiring a training data set, wherein the training data set comprises a sample image and an annotation frame, the sample image comprises a sample rotating target, and the annotation frame is an annotation frame of the sample rotating target;

and acquiring a target detection model, and training the target detection model through the training data set to obtain the trained target detection model, wherein the target detection model comprises a feature extraction network, a feature fusion network, a classification feature output network, a high-width feature output network and a rotation angle feature output network.

Optionally, the obtaining a target detection model and training the target detection model through the training data set pair to obtain a trained target detection model includes:

inputting the sample image into the target detection model to obtain a sample detection frame corresponding to the sample rotating target;

respectively coding the sample detection frame and the labeling frame through preset coding functions to respectively obtain sample function distribution corresponding to the sample detection frame and labeling function distribution corresponding to the labeling frame;

adjusting network parameters of the target detection model according to the measurement distance between the sample function distribution and the labeling function distribution;

and iterating the network parameter adjusting process of the target detection model until the target detection model converges or reaches a preset iteration number, so as to obtain the trained target detection model.

Optionally, the inputting the sample image into the target detection model to obtain a sample detection frame corresponding to the sample rotation target includes:

processing the sample image through the target detection model to obtain a sample feature map corresponding to the sample image, and constructing a matrix grid for the sample feature map according to the height and the width of the sample feature map, wherein the sample feature image comprises a classification feature map corresponding to the sample image, a height and width feature map and a rotation angle feature map;

establishing an index of a height and width attribute at each grid point in the height and width characteristic diagram, and establishing an index of a rotation angle attribute at each grid point in the rotation angle characteristic diagram;

and obtaining a sample detection frame corresponding to the sample rotating target according to each grid point in the sample characteristic diagram and the index attribute corresponding to the grid point.

Optionally, the labeling frame includes a labeling key point, and the adjusting of the network parameters of the target detection model according to the metric distance between the sample function distribution and the labeling function distribution includes:

calculating a first loss between the sample key point and the labeling key point according to the sample key point of the classification feature map;

converting the measurement distance into a second loss through a preset conversion function;

and adjusting network parameters of the target detection model based on the first loss and the second loss.

In a second aspect, an embodiment of the present invention provides an object detection apparatus, where the apparatus includes:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an image to be detected, and the image to be detected comprises a rotating target to be detected;

the extraction module is used for extracting the characteristics of the image to be detected through the trained target detection model to obtain the classification characteristics and the auxiliary characteristics of the rotating target to be detected;

and the processing module is used for carrying out auxiliary processing on the classification features based on the auxiliary features to obtain the detection result of the to-be-detected rotating target.

In a third aspect, an embodiment of the present invention provides an electronic device, including: the target detection method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the steps in the target detection method provided by the embodiment of the invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements steps in an object detection method provided by an embodiment of the present invention.

In the embodiment of the invention, an image to be detected is obtained, wherein the image to be detected comprises a rotating target to be detected; performing feature extraction on the image to be detected through a trained target detection model to obtain classification features and auxiliary features of the rotating target to be detected; and carrying out auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the to-be-detected rotating target. The classification features and the auxiliary features of the rotary target to be detected are extracted, and the classification features are subjected to auxiliary processing by utilizing the auxiliary features, so that the detection result of the rotary target to be detected is obtained, and the anchor is not required to be designed, so that non-maximum suppression is not required, the detection accuracy of the rotary target detection is improved, and the detection performance of the rotary target detection is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a target detection model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention, as shown in fig. 1, the target detection method is used for detecting a rotating target, and the target detection method includes the following steps:

101. and acquiring an image to be detected.

In an embodiment of the present invention, the image to be detected includes a rotating target to be detected. The image to be detected may be a side image, a top view image, a bottom view image, or the like, the side image may be an image captured from the side of the object, the top view image may be an image captured from above the object, and the bottom view image may be an image captured from below the object.

The rotary target to be detected can be a target with a solid body such as a person, a vehicle, an airplane, a building, an article and the like.

102. And performing feature extraction on the image to be detected through the trained target detection model to obtain the classification features and the auxiliary features of the rotary target to be detected.

In the embodiment of the invention, the image to be detected can be input into a trained target detection model, and the characteristic extraction is carried out on the image to be detected through the target detection model, so as to obtain the classification characteristic and the auxiliary characteristic of the rotary target to be detected.

In one possible embodiment, the image to be detected may be pre-processed before being input into the trained target detection model, and the pre-processing may include image pixel normalization and wide-scale scaling to H ₀ ×W ₀ Size, wherein H ₀ And W ₀ Is an integer multiple of 32.

The classification features of the to-be-detected rotating target include category information of the to-be-detected rotating target, for example, the category of the to-be-detected rotating target is a person, a vehicle, an airplane, a building, an article, and the like. The auxiliary characteristics of the to-be-detected rotating target may include attribute information such as height and width of the to-be-detected rotating target and a rotation angle.

Specifically, the trained target detection model comprises a classification feature branch structure and an auxiliary feature branch structure, common features can be extracted from the image to be detected by the trained target detection model firstly, corresponding classification features are output through the classification feature branch structure, and corresponding auxiliary features are output through the auxiliary feature branch structure. The classification feature branch structure and the assistant feature branch structure have different structural parameters.

Further, the target detection model may be constructed based on a deep convolutional neural network, and the trained target detection model is obtained after the deep convolutional neural network is trained. The method specifically comprises the steps of collecting a sample image, wherein the sample image comprises a sample rotating target, the sample rotating target can be a target such as a person, a vehicle, an airplane, a building and an article, labeling the sample rotating target in the sample image to obtain corresponding label data, the labeling comprises category labeling corresponding to classification features and attribute labeling corresponding to auxiliary features, and the attribute labeling can comprise height-width labeling and rotation angle labeling. And training the deep convolutional neural network through the sample image and the corresponding label data, so that the deep convolutional neural network learns the classification characteristic of the rotating target and the auxiliary characteristic of the rotating target to output, and the trained target detection model is obtained after training.

103. And performing auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the rotary target to be detected.

In the embodiment of the present invention, the auxiliary features of the to-be-detected rotating target may include attribute information such as height and width of the to-be-detected rotating target, and a rotation angle, and the attribute information corresponding to the auxiliary feature index may be added to the classification features, so as to obtain a detection result of the to-be-detected rotating target.

The detection result may include the position, type, height, width, and rotation angle of the rotation target to be detected.

In the embodiment of the invention, an image to be detected is obtained, wherein the image to be detected comprises a rotating target to be detected; performing feature extraction on the image to be detected through a trained target detection model to obtain classification features and auxiliary features of the rotary target to be detected; and performing auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the to-be-detected rotary target. The classification features and the auxiliary features of the rotary target to be detected are extracted, and the classification features are subjected to auxiliary processing by utilizing the auxiliary features, so that the detection result of the rotary target to be detected is obtained, and the anchor is not required to be designed, so that non-maximum suppression is not required, the detection accuracy of the rotary target detection is improved, and the detection performance of the rotary target detection is further improved.

Optionally, the trained target detection model comprises a feature extraction network, a feature fusion network, a classification feature output network and an auxiliary feature output network, and in the step of performing feature extraction on the image to be detected through the trained target detection model to obtain the classification features and the auxiliary features of the rotating target to be detected, the feature extraction is performed on the image to be detected through the feature extraction network to obtain the multi-scale features of the image to be detected; performing feature fusion on the multi-scale features through a feature fusion network to obtain fusion features of the image to be detected; predicting the fusion characteristics through a classification characteristic output network to obtain classification characteristics of the to-be-detected rotating targets, wherein the classification characteristics comprise characteristic channels, and the to-be-detected rotating targets of different classes correspond to different characteristic channels; and predicting the fusion characteristics through an auxiliary characteristic output network to obtain the auxiliary characteristics of the to-be-detected rotating target.

In the embodiment of the present invention, the feature extraction network may be a backbone network, such as VGG19, resNet, mobileNet, and the like. The characteristic extraction network can improve the characteristics of the image to be detected under different scales to obtain the multi-scale characteristics of the image to be detected. In the feature extraction network, due to the existence of the down-sampling layer, as the calculation depth of the feature extraction network is deeper, the extracted feature scale is smaller.

The feature fusion network can comprise an upper sampling layer and a fusion layer, the feature with the small size is up-sampled through the upper sampling layer, the feature with the small size is up-sampled to be the feature with the large size, and then the feature after up-sampling is fused with the feature with the same size through the fusion layer. Specifically, the feature fusion network extracts multi-scale features from features of different stages of the feature extraction network, performs 2-time up-sampling on small-scale features one by one, fuses the small-scale features with features of the same scale extracted from the feature extraction network, and finally outputs high-scale fusion features to the prediction network.

The classification feature output network and the auxiliary feature output network can also be called as a prediction network, corresponding classification features are output through the classification feature output network, the output classification features can comprise a plurality of feature channels, and each feature channel corresponds to one category of a to-be-detected rotating target.

Specifically, the classification feature output network may be a classification feature output network based on centretret, the fusion feature may be input to the classification feature output network based on centretret, the centerpoint thermodynamic diagrams of the to-be-detected rotating target are predicted by the centretret as the classification feature, and the centerpoint thermodynamic diagrams of different classes are distributed in different feature channels, so that the class of the to-be-detected rotating target may be determined by the feature channels.

The assist feature may be an attribute feature output network based on centretret, and the attribute information corresponding to each center point may be predicted as the assist information by centretret by inputting the fusion feature into the attribute feature output network based on centretret, wherein the scale resolution of the assist feature is the same as the scale resolution of the classification feature. The auxiliary features may include attribute information such as height and width of a target to be detected and a rotation angle. In the auxiliary feature, each position point corresponds to a group of attribute information, and the attribute information of the corresponding position can be indexed through the position of the center point of the classification feature.

Optionally, the auxiliary features include height and width features and rotation angle features, and in the step of performing auxiliary processing on the classification features based on the auxiliary features to obtain the detection result of the to-be-detected rotating target, key point extraction may be performed on the classification features to obtain target key points of the to-be-detected rotating target; based on the target key points, indexing a target height and width attribute corresponding to the height and width characteristic and indexing a target rotation angle attribute corresponding to the rotation angle characteristic; and obtaining a detection result of the to-be-detected rotating target based on the target key point, the target height and width attribute and the target rotating angle attribute.

In an embodiment of the present invention, the features include a height and width feature and a rotation angle feature, the height and width feature corresponds to a height and width attribute of the to-be-detected rotating target, and the rotation angle feature corresponds to a rotation angle attribute of the to-be-detected rotating target. Each position point in the classification features corresponds to a position point in one high-width feature and a position point in one rotation angle feature, so that the corresponding position point in the high-width feature indexes the high-width attribute as a target according to the target key point in the classification features, and the corresponding position point in the rotation angle feature indexes the rotation angle attribute.

Specifically, the height-width feature and the rotation angle feature have the same scale resolution as the classification feature, the height of the classification feature is H, the width of the classification feature is W, and similarly, the height of the height-width feature is H, the width of the classification feature is W, and the height of the rotation angle feature is H, and the width of the rotation angle feature is W. The classification feature can be a central point thermodynamic diagram, the central point of the thermodynamic diagram is a position point with the highest thermodynamic value, and the central point can also be used as a target key point. Specifically, after the classification features are obtained, the classification features may be sampled through a maximum pooling score of n × n, and a high-confidence key point is obtained according to a preset confidence threshold value and is used as a target key point. Wherein n is less than H and n is less than W. The role of the maximal pooling kernel is to sample the maximum from within the thermodynamic map n x n region. For example, with n =3, the value with the highest thermal value may be sampled in the thermodynamic diagram 3*3 region as the sample value of the largest pooling core.

After the target keypoint (i, j) is obtained, wherein (i, j) represents the position coordinate of the target keypoint in the classification feature. And indexing a corresponding target height and width attribute (w, h) in the height and width features according to the target key point (i, j), wherein w represents the width of the to-be-detected rotating target in the classification features, and h represents the height of the to-be-detected rotating target in the classification features. Meanwhile, the corresponding target rotation angle attribute theta can be indexed in the height and width features according to the target key point (i, j), wherein theta represents the rotation angle of the to-be-detected rotating target in the classification features.

And obtaining a detection result (i, j, w, h, theta) of the to-be-detected rotating target according to the target key point (i, j), the target height and width attribute (w, h) and the target rotating angle attribute theta.

In a possible embodiment, the auxiliary feature may further include a bias feature, and the bias feature is used to describe the offset of the target keypoint. Specifically, the bias feature may be obtained by adding a bias feature output network to the target detection model and predicting the fusion feature through the bias feature output network. The bias features and the classification features also have the same scale resolution, and the height of the bias features is H, and the width of the bias features is W. Each location point in the classification feature corresponds to a location point in one of the bias features. The corresponding target bias attribute (dx, dy) can be indexed according to the bias feature of the target key point (i, j), and the final position of the to-be-detected rotating target is (x, y), where x = i + dx, y = j + dy. And combining the target height and width attribute (w, h) and the target rotation angle attribute theta to obtain a detection result (x, y, w, h, theta) of the to-be-detected rotating target. And obtaining the corresponding target bias attribute in the bias characteristic index through the target key point, so that the position of the rotating target to be detected can be more accurately determined.

Optionally, before feature extraction is performed on an image to be detected through a trained target detection model to obtain classification features and auxiliary features of a rotating target to be detected, a training data set can be obtained, wherein the training data set comprises a sample image and an annotation frame, the sample image comprises a sample rotating target, and the annotation frame is an annotation frame of the sample rotating target; and acquiring a target detection model, and training the target detection model through a training data set to obtain the trained target detection model, wherein the target detection model comprises a feature extraction network, a feature fusion network, a classification feature output network, a height-width feature output network and a rotation angle feature output network.

In the embodiment of the invention, before the feature extraction is carried out on the image to be detected through the trained target detection model, the target detection model can be trained. The method can collect and label a sample image containing a sample rotating target to obtain a training data set, wherein the sample rotating target and the rotating target to be detected have the same category. The rotary target can be labeled in the sample image by expert personnel to obtain a labeling frame, wherein the labeling frame comprises the type of the sample rotary target, the position of the labeling frame, the height and the width of the labeling frame and the rotation angle of the labeling frame. It should be noted that the classification feature output network, the high-wide feature output network, the rotation angle feature output network, and the bias feature output network are all independent and parallel branch networks.

Specifically, please refer to fig. 2, fig. 2 is a schematic structural diagram of a target detection model according to an embodiment of the present invention, as shown in fig. 2, in the target detection model, an output of a feature extraction network is connected to an input of a feature fusion network, and an output of the feature fusion network is connected to inputs of a classification feature output network, a high-wide feature output network, a rotation angle feature output network, and an offset feature output network, respectively.

Training a target detection model through a training data set, and iteratively adjusting network parameters in a feature extraction network, a feature fusion network, a classification feature output network, a height and width feature output network, a rotation angle feature output network and a bias feature output network in the training process until the target detection model converges or reaches a preset iteration number to obtain the trained target detection model.

Optionally, in the step of obtaining a target detection model and training the target detection model through a training data set to obtain a trained target detection model, a sample image may be input to the target detection model to obtain a sample detection frame corresponding to a sample rotation target; respectively coding the sample detection frame and the marking frame through preset coding functions to respectively obtain sample function distribution corresponding to the sample detection frame and marking function distribution corresponding to the marking frame; according to the measurement distance between the sample function distribution and the labeling function distribution, network parameter adjustment is carried out on the target detection model; and iterating the network parameter adjusting process of the target detection model until the target detection model converges or reaches a preset iteration number, so as to obtain the trained target detection model.

In the embodiment of the invention, the sample image can be input into the target detection model in the training process, and the sample detection frame output by the target detection model is obtained. The sample detection frame can be obtained through prediction results output by a classification characteristic output network, a height and width characteristic output network, a rotation angle characteristic output network and a bias characteristic output network in the target detection model. After the sample detection frame is obtained, the loss between the sample detection frame and the label detection frame can be calculated, the back propagation is carried out through the loss between the sample detection frame and the label detection frame, the network parameters of each network in the target detection model are adjusted, the processes are iterated, and the training of the target detection model is completed.

Further, in order to improve the auxiliary effect of the auxiliary features and further improve the detection accuracy of the target detection model, the embodiment of the invention respectively encodes the sample detection frame and the labeling frame through the encoding function, and performs encoding coupling on the classification features and the auxiliary features in the sample detection frame through the encoding function, specifically, performs encoding coupling on the position of the labeling frame, the width and height features and the rotation angle features through the encoding function, so that the target detection model learns the coupling relation, and the trained target detection model outputs more accurate auxiliary features.

Further, the encoding function may be a non-linear distribution function, such as a two-dimensional gaussian distribution function, and specifically, may be represented by the following formula:

μ＝(x,y) ^T

wherein (x, y, w, h, θ) is an expression form of the detection frame, (x, y) is a center point coordinate of the detection frame, (w, h) is a width and a height of the detection frame, and θ is a rotation angle of the rotating target in the detection frame. By the above equation, the detection frame (x, y, w, h, θ) is encoded into a two-dimensional gaussian distribution form (μ, Σ), specifically, μ denotes a mean value of the converted two-dimensional gaussian distribution, and Σ denotes a covariance of the converted two-dimensional gaussian distribution. The coding of the sample detection box and the label box can be performed by the above formula.

After the sample detection frame is encoded, a sample function distribution (mu) is obtained ₁ ，Σ ₁ ) (ii) a After encoding the label box, the label function distribution (mu) is obtained ₂ ，Σ ₂ ). Calculating a sample function distribution (μ) ₁ ，Σ ₁ ) And the distribution of the labeling function (mu) ₂ ，Σ ₂ ) Measure the distance between them. Under the condition that the sample image is a positive sample, the smaller the measurement distance is, the more similar the sample detection frame and the labeling frame are, and the more the detection result accords with the real result; the larger the measurement distance is, the more dissimilar the sample detection frame and the labeling frame is, and the more dissimilar the detection result is to the real result. In the sample imageUnder the condition of negative samples, the smaller the measurement distance is, the more dissimilar the sample detection frame and the labeling frame is, and the more the detection result conforms to the real result; the larger the measurement distance is, the more similar the sample detection frame and the labeling frame are, and the less the detection result is in accordance with the real result.

The calculation of the measurement distance can be performed by adopting calculation methods such as Wasserstein distance and KL divergence, and the Wasserstein distance is preferably used for calculating sample function distribution (mu) in the embodiment of the invention ₁ ，Σ ₁ ) And the distribution of the labeling function (mu) ₂ ，Σ ₂ ) The metric distance between the two can be specifically represented by the following formula:

wherein d is the sample function distribution (μ) ₁ ，Σ ₁ ) And the distribution of the labeling function (mu) ₂ ，Σ ₂ ) The Tr () function represents the trace of the calculated matrix.

In the training process, the detection frame and the labeling frame are coded, so that the classification features and the auxiliary features are coupled through a coding function, the learning capability of the target detection model on the auxiliary features is improved, and the trained target detection model can extract more accurate auxiliary features.

In a possible embodiment, the target detection model further includes a biased feature output network, multi-scale features of the sample image are extracted through the feature extraction network, and the multi-scale features of the sample image are fused through the feature fusion network. Predicting the fused features through a classified feature output network to obtain classified features; predicting the fused features through a high-width feature output network to obtain high-width features; predicting the fused features through a rotation angle feature output network to obtain rotation angle features; and predicting the fused features through a bias feature output network to obtain bias features. Extracting key points through the classification features to obtain target key points, indexing corresponding target offset attributes in the offset features based on the target key points, and obtaining a detection result of the sample rotating target based on the target key points, the target height and width attributes and the target rotating angle attributes, wherein the detection result of the sample rotating target corresponds to the sample detection frame.

By adding the output network corresponding to the auxiliary features in the target detection model, the classification features and the auxiliary features are coupled, so that the trained target detection model can output more accurate auxiliary features.

Optionally, in the step of inputting the sample image into the target detection model to obtain the sample detection frame corresponding to the sample rotation target, the sample image may be processed by the target detection model to obtain a sample feature map corresponding to the sample image, and a matrix grid is constructed for the sample feature map according to the height and width of the sample feature map, where the sample feature image includes a classification feature map corresponding to the sample image, a height and width feature map, and a rotation angle feature map; establishing an index of a height and width attribute at each grid point in the height and width characteristic diagram, and establishing an index of a rotation angle attribute at each grid point in the rotation angle characteristic diagram; and obtaining a sample detection frame corresponding to the sample rotating target according to each grid point in the sample characteristic diagram and the index attribute corresponding to the grid point.

In the embodiment of the invention, the sample image can be processed through the target detection model to obtain the sample characteristic diagram corresponding to the sample image. The sample feature map includes a classification feature map, a height-width feature map, and a rotation angle feature map, and in a possible embodiment, may further include a bias feature map. The classification feature map, the height-width feature map, the rotation angle feature map and the bias feature map have the same height H and width W.

A matrix grid may be created for the sample feature map according to the height H and the width W of the sample feature map, and specifically, a matrix grid may be created for the sample feature map by a mesh grid. For one grid point (i) ₀ ，j ₀ ) In the aspect feature map, the grid point (i) corresponding to the feature point (i) may be established ₀ ，j ₀ ) The width attribute w of the detection frame ₀ And the height h of the detection frame ₀ And in the rotation angleThe degree profile may also be established to correspond to the grid point (i) ₀ ，j ₀ ) Is a rotation angle attribute of ₀ The index relationship can be established in the bias feature map corresponding to the grid point (i) ₀ ，j ₀ ) X-direction bias property dx of ₀ And y-direction bias attribute dy ₀ The index relationship of (2). For the center point (i) of the sample detection frame ₁ ，j ₁ ) Then the sample height and width attribute (w) can be indexed ₁ ，h ₁ ) Sample rotation angle attribute θ ₁ Sample bias property (dx) ₁ ，dy ₁ ) Thereby obtaining a sample detection frame (x) ₁ ，y ₁ ，w ₁ ，h ₁ ，θ ₁ )。

By adding the output network corresponding to the auxiliary features in the target detection model and establishing the index relation between the classification features and the auxiliary features, the trained target detection model can output more accurate auxiliary features.

Optionally, the labeling frame includes labeling key points, and in the step of adjusting network parameters of the target detection model according to the measurement distance between the sample function distribution and the labeling function distribution, first losses between the sample key points and the labeling key points can be calculated according to the sample key points of the classification feature map; converting the measurement distance into a second loss through a preset conversion function; and adjusting network parameters of the target detection model based on the first loss and the second loss.

In the embodiment of the invention, the sample key point is a central point in the sample detection frame, and the upper artwork sample key point is calculated in the same way as the target key point and is obtained by calculating through a maximum pooling kernel. The calculating the first loss between the sample key point and the labeling key point may be performed by a first loss function, where the first loss function is represented by the following equation:

loss _hm ＝Guassian_focal_loss(hm _pre d,hm _target )

therein, loss _hm Is the first loss, hm _pred For sample testingPredicted result of center point, hm _target And the real label corresponding to the central point of the marking frame. Guassian _ focal _ loss () is the first loss function.

The second loss is obtained by a predetermined conversion function, which may be a non-linear function, and specifically, the predetermined rotation function may be represented by the following equation:

therein, loss _rbbox For the second loss, d is the metric distance between the sample function distribution and the labeling function distribution, and τ is an adjustable constant.

Based on the first loss and the second loss, a total loss of the classification features and the assistant features can be obtained, which is shown by the following equation:

Loss＝loss _hm +λloss _rbbox

the lambda is a prior coefficient and can be adjusted according to prior knowledge in the training process.

In a possible embodiment, the assistant feature further includes a bias feature, and a loss between the bias attribute corresponding to the sample detection box and the labeled bias attribute corresponding to the labeled box may be calculated as a third loss, where the third loss may be calculated by a third loss function, and the third loss function may be represented by the following equation:

loss _offset ＝Smooth-L1(offset _pred ,offset _targe t)

among them, loss _offset Is the third loss, hm _pred For the prediction of the sample detection box bias attribute, hm _target And biasing the real label of the attribute for the corresponding label box. Smooth-L1 () is the third loss function.

In this embodiment, based on the first loss, the second loss, and the third loss, a total loss of the classification features and the assistant features can be obtained, which is shown by the following equation:

Loss＝loss _hm +λ ₁ loss _offset +λ ₂ loss _rbbox

wherein, the above λ ₁ Is a first prior coefficient, λ ₂ The first prior coefficient and the second prior coefficient can be adjusted according to prior knowledge in the training process.

In the embodiment of the invention, the output network corresponding to the auxiliary features is added, and the auxiliary features are trained through the corresponding first loss and second loss, so that the trained target detection model can output more accurate auxiliary features.

Optionally, in the step of performing auxiliary processing on the classification feature based on the auxiliary feature to obtain the detection result of the to-be-detected rotating target, a target width attribute w and a target height attribute h of a corresponding position may be indexed on the height-width feature in the obtained coordinates (i, j) of the target key point, an x-direction offset dx and a y-direction offset dy of the corresponding position may be indexed on the offset feature, and a target rotating angle of the corresponding position may be indexed on the rotating angle feature; the rotating target to be detected may be represented in the form of (x, y, w, h, θ), where x = i + dx and y = j + dy; and (x, y, w, h) four element values of all detection results are scaled to the original image scale (x ', y', w ', h') of the image to be detected, so that the final detection result representation form of the rotating target to be detected is (x ', y', w ', h', theta).

In the embodiment of the invention, the classification characteristic and the auxiliary characteristic of the rotary target to be detected are extracted, and the classification characteristic is subjected to auxiliary processing by using the auxiliary characteristic, so that the detection result of the rotary target to be detected is obtained, and the anchor is not required to be designed, so that non-maximum suppression is not required, the detection accuracy of the rotary target detection is improved, and the detection performance of the rotary target detection is further improved.

It should be noted that the target detection method provided by the embodiment of the present invention may be applied to devices such as a smart phone, a computer, and a server that can perform target detection.

Optionally, referring to fig. 3, fig. 3 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes:

the first acquisition module 301 is configured to acquire an image to be detected, where the image to be detected includes a rotating target to be detected;

an extraction module 302, configured to perform feature extraction on the image to be detected through the trained target detection model to obtain a classification feature and an auxiliary feature of the to-be-detected rotating target;

and the processing module 303 is configured to perform auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the to-be-detected rotating target.

Optionally, the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network, and the extraction module 302 includes:

the first extraction submodule is used for extracting the features of the image to be detected through the feature extraction network to obtain the multi-scale features of the image to be detected;

the fusion submodule is used for carrying out feature fusion on the multi-scale features through the feature fusion network to obtain fusion features of the image to be detected;

the first prediction sub-module is used for predicting the fusion characteristics through the classification characteristic output network to obtain the classification characteristics of the to-be-detected rotating targets, wherein the classification characteristics comprise characteristic channels, and the to-be-detected rotating targets of different classes correspond to different characteristic channels;

and the second prediction submodule is used for predicting the fusion characteristics through the auxiliary characteristic output network to obtain the auxiliary characteristics of the to-be-detected rotating target.

Optionally, the assistant feature includes a height and width feature and a rotation angle feature, and the processing module 303 includes:

the second extraction submodule is used for extracting key points of the classification features to obtain target key points of the to-be-detected rotating target;

the indexing submodule is used for indexing a corresponding target height and width attribute in the height and width characteristic and indexing a corresponding target rotation angle attribute in the rotation angle characteristic based on the target key point;

and the first processing submodule is used for obtaining a detection result of the to-be-detected rotating target based on the target key point, the target height and width attribute and the target rotating angle attribute.

Optionally, the apparatus further comprises:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a training data set, the training data set comprises a sample image and an annotation frame, the sample image comprises a sample rotating target, and the annotation frame is an annotation frame of the sample rotating target;

and the training module is used for acquiring a target detection model and training the target detection model through the training data set to obtain the trained target detection model, wherein the target detection model comprises a feature extraction network, a feature fusion network, a classification feature output network, a high-wide feature output network and a rotation angle feature output network.

Optionally, the training module includes:

the second processing submodule is used for inputting the sample image into the target detection model to obtain a sample detection frame corresponding to the sample rotating target;

the coding sub-module is used for coding the sample detection frame and the labeling frame respectively through a preset coding function to respectively obtain sample function distribution corresponding to the sample detection frame and labeling function distribution corresponding to the labeling frame;

the adjusting submodule is used for adjusting network parameters of the target detection model according to the measurement distance between the sample function distribution and the labeling function distribution;

and the iteration submodule is used for iterating the network parameter adjustment process of the target detection model until the target detection model converges or reaches a preset iteration number, so as to obtain the trained target detection model.

Optionally, the second processing sub-module includes:

the first processing unit is used for processing the sample image through the target detection model to obtain a sample feature map corresponding to the sample image, and constructing a matrix grid for the sample feature map according to the height and the width of the sample feature map, wherein the sample feature image comprises a classification feature map, a height and width feature map and a rotation angle feature map corresponding to the sample image;

an index establishing unit, configured to establish an index of a height-width attribute at each grid point in the height-width feature map, and establish an index of a rotation angle attribute at each grid point in the rotation angle feature map;

and the second processing unit is used for obtaining a sample detection frame corresponding to the sample rotating target according to each grid point in the sample feature map and the index attribute corresponding to the grid point.

Optionally, the labeling frame includes a labeling key point, and the adjusting sub-module includes:

the calculating unit is used for calculating a first loss between the sample key point and the labeling key point according to the sample key point of the classification feature map;

the conversion unit is used for converting the measurement distance into a second loss through a preset conversion function;

and the adjusting unit is used for adjusting network parameters of the target detection model based on the first loss and the second loss.

The target detection device provided by the embodiment of the invention can be applied to equipment such as a smart phone, a computer, a server and the like which can perform target detection.

The target detection device provided by the embodiment of the invention can realize each process realized by the target detection method in the method embodiment, and can achieve the same beneficial effect. To avoid repetition, further description is omitted here.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 4, including: a memory 402, a processor 401 and a computer program of an object detection method stored on the memory 402 and executable on the processor 401, wherein:

the processor 401 is configured to call the computer program stored in the memory 402, and execute the following steps:

Optionally, the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network, and the processor 401 performs feature extraction on the image to be detected through the trained target detection model to obtain the classification feature and the auxiliary feature of the to-be-detected rotating target, including:

Optionally, the auxiliary features include height and width features and rotation angle features, and the performing, by the processor 401, auxiliary processing on the classification features based on the auxiliary features to obtain a detection result of the to-be-detected rotating target includes:

based on the target key points, indexing a corresponding target height and width attribute in the height and width feature and indexing a corresponding target rotation angle attribute in the rotation angle feature;

Optionally, before the feature extraction is performed on the image to be detected through the trained target detection model to obtain the classification feature and the auxiliary feature of the to-be-detected rotating target, the method executed by the processor 401 further includes:

Optionally, the obtaining the target detection model and the training of the target detection model by the processor 401, which are executed by the processor 401, through the training of the target detection model of the training data set, obtain a trained target detection model, which includes:

Optionally, the inputting, performed by the processor 401, the sample image into the target detection model to obtain a sample detection frame corresponding to the sample rotation target includes:

Optionally, the labeling box includes a labeling key point, and the adjusting, performed by the processor 401, the network parameter of the target detection model according to the metric distance between the sample function distribution and the labeling function distribution includes:

The electronic equipment provided by the embodiment of the invention can realize each process realized by the target detection method in the method embodiment, and can achieve the same beneficial effect. To avoid repetition, further description is omitted here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the target detection method or the application-side target detection method provided in the embodiment of the present invention, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An object detection method, characterized in that the object detection method is used for detection of a rotating object, comprising the steps of:

performing feature extraction on the image to be detected through a trained target detection model to obtain classification features and auxiliary features of the rotating target to be detected;

2. The method for detecting the target of claim 1, wherein the trained target detection model comprises a feature extraction network, a feature fusion network, a classification feature output network and an auxiliary feature output network, and the step of performing feature extraction on the image to be detected through the trained target detection model to obtain the classification feature and the auxiliary feature of the rotating target to be detected comprises:

3. The target detection method of claim 2, wherein the auxiliary features include height and width features and rotation angle features, and the performing auxiliary processing on the classification features based on the auxiliary features to obtain the detection result of the to-be-detected rotating target includes:

4. The object detection method of claim 3, wherein before the feature extraction is performed on the image to be detected through the trained object detection model to obtain the classification features and the auxiliary features of the rotating object to be detected, the method further comprises:

5. The method for detecting an object according to claim 4, wherein the obtaining an object detection model and training the object detection model through the training data set to obtain a trained object detection model comprises:

6. The object detection method of claim 5, wherein the inputting the sample image into the object detection model to obtain a sample detection frame corresponding to the sample rotation object comprises:

processing the sample image through the target detection model to obtain a sample characteristic diagram corresponding to the sample image, and constructing a matrix grid for the sample characteristic diagram according to the height and the width of the sample characteristic diagram, wherein the sample characteristic image comprises a classification characteristic diagram, a height and width characteristic diagram and a rotation angle characteristic diagram corresponding to the sample image;

and obtaining a sample detection frame corresponding to the sample rotation target according to each grid point in the sample characteristic diagram and the index attribute corresponding to the grid point.

7. The method of claim 6, wherein the labeling box includes a labeling key point, and the adjusting the network parameters of the target detection model according to the metric distance between the sample function distribution and the labeling function distribution comprises:

8. An object detection apparatus, characterized in that the apparatus comprises:

9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the object detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps in the object detection method according to any one of claims 1 to 7.