WO2024011873A1 - 目标检测方法、装置、电子设备及存储介质 - Google Patents

目标检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024011873A1
WO2024011873A1 PCT/CN2022/143514 CN2022143514W WO2024011873A1 WO 2024011873 A1 WO2024011873 A1 WO 2024011873A1 CN 2022143514 W CN2022143514 W CN 2022143514W WO 2024011873 A1 WO2024011873 A1 WO 2024011873A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
feature
detected
sample
features
Prior art date
Application number
PCT/CN2022/143514
Other languages
English (en)
French (fr)
Inventor
刘文龙
曾卓熙
肖嵘
王孝宇
Original Assignee
青岛云天励飞科技有限公司
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛云天励飞科技有限公司, 深圳云天励飞技术股份有限公司 filed Critical 青岛云天励飞科技有限公司
Publication of WO2024011873A1 publication Critical patent/WO2024011873A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the field of artificial intelligence, and in particular, to a target detection method, device, electronic equipment and storage medium.
  • Rotating target detection refers to detecting targets with a rotating direction, that is, the center point, width, height, and angle of the target need to be detected. It is more common in target detection in overhead views, such as remote sensing image target detection, aerial image target detection, etc.
  • Current rotating target detection is often a target detection algorithm based on anchor points and non-maximum suppression. Since the size and position of rotating targets in the image are different, anchor-based target detection algorithms have some inherent shortcomings, such as wanting to detect all For rotating targets of size and position, the design of the anchor will be very complicated. Different proportions and sizes must be designed. Moreover, the proportion and size design of the anchor are inappropriate, which will also affect the subsequent non-maximum value suppression, causing error accumulation. This affects the detection accuracy of the target detection model. Therefore, existing rotating target detection algorithms have the problem of low detection accuracy.
  • Embodiments of the present invention provide a target detection method, aiming to solve the problem of low detection accuracy of rotating target detection algorithms in the existing target detection process.
  • the detection results of the rotating target to be detected are obtained.
  • embodiments of the present invention provide a target detection method, which is used to detect rotating targets.
  • the method includes:
  • the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network.
  • the trained target detection model performs feature extraction on the image to be detected, Obtain the classification features and auxiliary features of the rotating target to be detected, including:
  • Feature fusion is performed on the multi-scale features through the feature fusion network to obtain the fusion features of the image to be detected;
  • the fusion features are predicted by the classification feature output network to obtain the classification features of the rotation target to be detected, the classification features include feature channels, and different categories of rotation targets to be detected correspond to different feature channels;
  • the fusion feature is predicted by the auxiliary feature output network to obtain the auxiliary feature of the rotating target to be detected.
  • the auxiliary features include height and width features and rotation angle features.
  • the auxiliary processing of the classification features based on the auxiliary features to obtain the detection result of the rotating target to be detected includes:
  • a detection result of the rotating target to be detected is obtained.
  • the method before performing feature extraction on the image to be detected through the trained target detection model to obtain the classification features and auxiliary features of the rotating target to be detected, the method further includes:
  • the training data set includes a sample image and an annotation frame
  • the sample image includes a sample rotation target
  • the annotation frame is an annotation frame of the sample rotation target
  • the target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and a height and width feature. output network and rotation angle feature output network.
  • obtaining a target detection model, and training the target detection model through the training data set to obtain a trained target detection model including:
  • the sample detection frame and the labeling frame are respectively encoded by preset encoding functions to obtain respectively the sample function distribution corresponding to the sample detection frame and the labeling function distribution corresponding to the labeling frame;
  • the network parameter adjustment process of the target detection model is iterated until the target detection model converges or reaches a preset number of iterations, and a trained target detection model is obtained.
  • inputting the sample image into the target detection model to obtain a sample detection frame corresponding to the sample rotation target includes:
  • the sample image is processed by the target detection model to obtain a sample feature map corresponding to the sample image, and a matrix grid is constructed for the sample feature map according to the height and width of the sample feature map.
  • the feature image includes a classification feature map, a height-width feature map and a rotation angle feature map corresponding to the sample image;
  • Each grid point in the height and width feature map establishes an index of the height and width attributes, and each grid point in the rotation angle feature map establishes an index of the rotation angle attribute;
  • annotation box includes annotation key points
  • adjusting the network parameters of the target detection model according to the metric distance between the sample function distribution and the annotation function distribution includes:
  • network parameters of the target detection model are adjusted.
  • an embodiment of the present invention provides a target detection device, which includes:
  • a first acquisition module configured to acquire an image to be detected, where the image to be detected includes a rotating target to be detected
  • An extraction module configured to perform feature extraction on the image to be detected through a trained target detection model to obtain classification features and auxiliary features of the rotating target to be detected;
  • a processing module configured to perform auxiliary processing on the classification feature based on the auxiliary feature to obtain the detection result of the rotating target to be detected.
  • embodiments of the present invention provide an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program Implement the steps in the target detection method provided by the embodiment of the present invention.
  • embodiments of the present invention provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the target detection method provided by the embodiment of the invention is implemented. step.
  • an image to be detected is obtained, and the image to be detected includes a rotating target to be detected; feature extraction is performed on the image to be detected through a trained target detection model to obtain the classification features and characteristics of the rotating target to be detected.
  • Auxiliary features perform auxiliary processing on the classification features based on the auxiliary features to obtain the detection result of the rotating target to be detected.
  • Figure 1 is a flow chart of a target detection method provided by an embodiment of the present invention.
  • Figure 2 is a schematic structural diagram of a target detection model provided by an embodiment of the present invention.
  • Figure 3 is a schematic structural diagram of a target detection device provided by an embodiment of the present invention.
  • Figure 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • Figure 1 is a flow chart of a target detection method provided by an embodiment of the present invention. As shown in Figure 1, the target detection method is used to detect rotating targets. The target detection method includes the following steps:
  • the image to be detected includes a rotating target to be detected.
  • the above-mentioned image to be detected can be a side image, a top-view image, a bottom-view image, etc.
  • the side image can be an image taken from the side of the target
  • the top-view image can be an image taken from above the target
  • the bottom-view image can be taken from below the target. image.
  • the above-mentioned rotating targets to be detected may be persons, vehicles, aircraft, buildings, objects and other physical targets.
  • the image to be detected can be input into the trained target detection model, and the features of the image to be detected can be extracted through the target detection model to obtain the classification features and auxiliary features of the rotating target to be detected.
  • the image to be detected before inputting the image to be detected into the trained target detection model, can be preprocessed.
  • the above preprocessing can include image pixel normalization and width and height scaling to H 0 ⁇ W 0 size, where the sizes of H 0 and W 0 are integer multiples of 32.
  • the above classification features of the rotating target to be detected include category information of the rotating target to be detected.
  • the categories of the rotating target to be detected are people, vehicles, aircrafts, buildings, objects, etc.
  • the above-mentioned auxiliary features of the rotating target to be detected may include attribute information such as the height, width, and rotation angle of the rotating target to be detected.
  • the trained target detection model includes a classification feature branch structure and an auxiliary feature branch structure.
  • the trained target detection model can first extract common features from the image to be detected, and output the corresponding classification features through the classification feature branch structure. Through the auxiliary feature The feature branch structure outputs corresponding auxiliary features.
  • the classification feature branch structure and the auxiliary feature branch structure have different structural parameters.
  • the above target detection model can be constructed based on a deep convolutional neural network.
  • a trained target detection model is obtained.
  • sample images can be collected.
  • the sample images include sample rotation targets.
  • the sample rotation targets can be people, vehicles, aircraft, buildings, objects, etc.
  • the sample rotation targets in the sample images are labeled to obtain the corresponding label data.
  • the labeling includes Category annotations corresponding to classification features, and attribute annotations corresponding to auxiliary features. Attribute annotations may include height and width annotations and rotation angle annotations.
  • the deep convolutional neural network is trained through the sample image and the corresponding label data, so that the deep convolutional neural network learns the classification features of the rotating target and the auxiliary features of the rotating target and outputs them. After the training is completed, the trained target detection model is obtained.
  • the auxiliary features of the rotating target to be detected may include attribute information such as the height, width, and rotation angle of the rotating target to be detected.
  • the attribute information corresponding to the auxiliary feature index can be added to the classification features, thereby obtaining the rotating target to be detected. test results.
  • the above-mentioned detection results may include the position, category, height, width, rotation angle, etc. of the rotating target to be detected.
  • an image to be detected is obtained, and the image to be detected includes a rotating target to be detected; feature extraction is performed on the image to be detected through a trained target detection model to obtain the classification features and characteristics of the rotating target to be detected.
  • Auxiliary features perform auxiliary processing on the classification features based on the auxiliary features to obtain the detection result of the rotating target to be detected.
  • the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network.
  • the rotation target to be detected is obtained.
  • the fusion features are predicted through the classification feature output network to obtain the classification features of the rotating target to be detected.
  • the classification features include feature channels, and different categories of rotating targets to be detected correspond to different feature channels; the fusion features are predicted through the auxiliary feature output network , to obtain the auxiliary features of the rotating target to be detected.
  • the above feature extraction network may be composed of a backbone network, such as VGG19, ResNet, MobileNet, etc.
  • the embodiment of the present invention does not place any restrictions on the feature extraction network.
  • the above feature extraction network can improve the characteristics of the image to be detected at different scales and obtain the multi-scale features of the image to be detected. It should be noted that in the feature extraction network, due to the existence of the downsampling layer, as the computational depth of the feature extraction network becomes deeper, the scale of the extracted features becomes smaller.
  • the above feature fusion network may include an upsampling layer and a fusion layer. Smaller features are upsampled through the upsampling layer, so that smaller features are upsampled into larger features, and then the upsampled features are upsampled through the fusion layer. The final features are fused with features of the same size. Specifically, the feature fusion network extracts multi-scale features from features at different stages of the feature extraction network, and upsamples the small-scale features by 2 times one by one and fuses them with the same-scale features from the feature extraction network, and finally combines the high-scale features. The fused features are output to the prediction network.
  • the above classification feature output network and auxiliary feature output network can also be called prediction networks.
  • the corresponding classification features are output through the classification feature output network.
  • the output classification features can include multiple feature channels, and each feature channel corresponds to a rotating target to be detected. of a category.
  • the above classification feature output network can be a classification feature output network based on CenterNetR.
  • the fusion feature can be input to the classification feature output network based on CenterNetR, and CenterNetR can predict the center point heat map of the rotating target to be detected as a classification feature for different categories.
  • the center point heat map of is distributed in different feature channels. Therefore, the category of the rotating target to be detected can be determined from the feature channels.
  • the above auxiliary features can also be the attribute feature output network based on CenterNetR.
  • the fusion features can be input to the attribute feature output network based on CenterNetR, and the attribute information corresponding to each center point is predicted through CenterNetR as auxiliary information.
  • the scale resolution of the above auxiliary features is The rate is the same as the scale resolution of the categorical features.
  • the above-mentioned auxiliary features may include attribute information such as the height, width, and rotation angle of the rotating target to be detected.
  • each position point corresponds to a set of attribute information. Through the center point position of the classification feature, the attribute information of the corresponding position can be indexed.
  • the auxiliary features include height and width features and rotation angle features.
  • the classification features are auxiliary processed to obtain the detection results of the rotation target to be detected. Key points of the classification features can be extracted to obtain the rotation to be detected.
  • Target key points of the target based on the target key points, index the corresponding target height and width attributes in the height and width features, and index the corresponding target rotation angle attributes in the rotation angle feature; based on the target key points, target height and width attributes, and target rotation Angle attribute to obtain the detection result of the rotating target to be detected.
  • the above features include height and width features and rotation angle features.
  • the above height and width features correspond to the height and width attributes of the rotating target to be detected
  • the above rotation angle features correspond to the rotation angle attributes of the rotating target to be detected.
  • Each position point in the classification feature corresponds to a position point in the height and width feature and a position point in the rotation angle feature. Therefore, the corresponding position point in the height and width feature can be indexed to the height according to the target key point in the classification feature.
  • the width attribute is used as the target, and the corresponding position point in the rotation angle feature is indexed to the rotation angle attribute.
  • the height and width features and rotation angle features have the same scale resolution as the classification features.
  • the height of the above classification features is H
  • the width is W.
  • the height and width features are H
  • width is W
  • the rotation angle is The height of the feature is H and the width is W.
  • the classification feature can be a center point heat map.
  • the center point of the heat map is the position point with the highest heat value. This center point can also be used as a target key point.
  • the classification features can be sampled through an n*n maximum pooling kernel, and high-confidence key points can be obtained as target key points according to the preset confidence threshold. Among them, n is smaller than H, and n is smaller than W.
  • (i, j) represents the position coordinate of the target key point in the classification feature.
  • the corresponding target height and width attributes (w, h) can be indexed in the height and width features according to the target key points (i, j), where w represents the width of the rotation target to be detected in the classification feature, and h represents the width of the rotation target to be detected in the classification feature. height in categorical features.
  • the corresponding target rotation angle attribute ⁇ can be indexed in the height and width features according to the target key points (i, j), where ⁇ represents the rotation angle of the rotation target to be detected in the classification feature.
  • the detection result of the rotating target to be detected (i, j, w, h, ⁇ ) can be obtained.
  • the above-mentioned auxiliary features may also include offset features, and the above-mentioned offset features are used to describe the offset of the target key point.
  • the above bias features can be obtained by adding a bias feature output network to the target detection model, and predicting the fusion features through the bias feature output network.
  • the bias feature and the classification feature also have the same scale resolution.
  • the height of the bias feature is H and the width is W.
  • Each position point in the classification feature corresponds to a position point in the bias feature.
  • the detection results of the rotating target to be detected (x, y, w, h, ⁇ ) can be obtained.
  • the corresponding target offset attributes are obtained through the offset feature index of the target key points, which can more accurately determine the position of the rotating target to be detected.
  • a training data set can also be obtained before performing feature extraction on the image to be detected through the trained target detection model to obtain the classification features and auxiliary features of the rotating target to be detected.
  • the training data set includes sample images and annotation boxes. Samples The image includes the sample rotation target, and the labeling frame is the labeling frame of the sample rotation target; obtain the target detection model, and train the target detection model through the training data set to obtain the trained target detection model.
  • the target detection model includes a feature extraction network , feature fusion network, classification feature output network, height and width feature output network, and rotation angle feature output network.
  • the target detection model may be trained before performing feature extraction on the image to be detected through the trained target detection model.
  • Sample images containing sample rotation targets can be collected and annotated to obtain a training data set.
  • the sample rotation target has the same category as the rotation target to be detected.
  • Experts can mark the rotation target in the sample image to obtain the label frame.
  • the label frame includes the category of the sample rotation target, the position of the label frame, the height and width of the label frame, and the rotation angle of the label frame.
  • the auxiliary features include offset features
  • the labeled frame target is offset.
  • the target detection model also includes an offset feature output network. It should be noted that the classification feature output network, height and width feature output network, rotation angle feature output network and bias feature output network are all independent and parallel branch networks.
  • Figure 2 is a schematic structural diagram of a target detection model provided by an embodiment of the present invention.
  • the output of the feature extraction network is connected to the input of the feature fusion network.
  • the output of the feature fusion network is connected to the inputs of the classification feature output network, height and width feature output network, rotation angle feature output network and bias feature output network respectively.
  • the target detection model is trained through the training data set.
  • the feature extraction network, feature fusion network, classification feature output network, height and width feature output network, rotation angle feature output network and offset feature output network are iteratively adjusted. Network parameters until the target detection model converges or reaches the preset number of iterations, the trained target detection model is obtained.
  • the sample image can be input into the target detection model to obtain the image corresponding to the sample rotation target.
  • Sample detection frame encode the sample detection frame and the labeling frame respectively through the preset encoding functions to obtain the sample function distribution corresponding to the sample detection frame and the labeling function distribution corresponding to the labeling frame; according to the relationship between the sample function distribution and the labeling function distribution measure the distance between them, adjust the network parameters of the target detection model; iterate the network parameter adjustment process of the target detection model until the target detection model converges or reaches the preset number of iterations, and the trained target detection model is obtained.
  • the sample image can be input into the target detection model to obtain the sample detection frame output by the target detection model.
  • the sample detection frame can be obtained through the prediction results output by the classification feature output network, height and width feature output network, rotation angle feature output network, and offset feature output network in the target detection model.
  • the loss between the sample detection frame and the annotation detection frame can be calculated, and the loss between the sample detection frame and the annotation detection frame can be back propagated to adjust the network parameters of each network in the target detection model and iterate
  • the above process completes the training of the target detection model.
  • the embodiment of the present invention encodes the sample detection frame and the labeling frame respectively through the encoding function, and uses the encoding function to combine the classification features in the sample detection frame with
  • the auxiliary features are coded and coupled.
  • the labeling box position is coded and coupled with the width and height features and rotation angle features through the coding function, so that the target detection model can learn this coupling relationship, making the output of the trained target detection model more accurate. auxiliary features.
  • the above encoding function can be a nonlinear distribution function, for example, it can be a two-dimensional Gaussian distribution function. Specifically, it can be as shown in the following formula:
  • the above (x, y, w, h, ⁇ ) is the expression form of the detection frame
  • (x, y) is the center point coordinate of the detection frame
  • (w, h) is the width and height of the detection frame
  • is the detection frame The rotation angle of the rotation target in the box.
  • the detection frame (x, y, w, h, ⁇ ) is encoded into a two-dimensional Gaussian distribution form ( ⁇ , ⁇ ).
  • represents the mean value of the converted two-dimensional Gaussian distribution
  • represents the converted two-dimensional Gaussian distribution. Covariance of Gaussian distribution.
  • Both the sample detection frame and the labeling frame can be encoded through the above formula.
  • the sample function distribution ( ⁇ 1 , ⁇ 1 ) is obtained; after encoding the labeling frame, the labeling function distribution ( ⁇ 2 , ⁇ 2 ) is obtained.
  • the test results are less consistent with the real results.
  • the calculation of the above metric distance can be carried out using calculation methods such as Wasserstein distance and KL divergence.
  • Wasserstein distance is preferred to calculate the relationship between the sample function distribution ( ⁇ 1 , ⁇ 1 ) and the label function distribution ( ⁇ 2 , ⁇ 2 ).
  • the metric distance can be specifically shown as the following formula:
  • d is the metric distance between the sample function distribution ( ⁇ 1 , ⁇ 1 ) and the label function distribution ( ⁇ 2 , ⁇ 2 ), and the Tr() function represents the trace of the calculated matrix.
  • the classification features and auxiliary features are coupled through the encoding function, which improves the learning ability of the target detection model for auxiliary features, so that the trained target detection model can extract more Accurate auxiliary features.
  • the target detection model also includes a bias feature output network, which extracts multi-scale features of the sample image through the feature extraction network, and fuses the multi-scale features of the sample image through the feature fusion network.
  • the fused features are predicted and processed through the classification feature output network to obtain classification features; the fused features are predicted and processed through the height and width feature output network to obtain the height and width features; the fused features are processed through the rotation angle feature output network.
  • Prediction processing is performed to obtain rotation angle features; prediction processing is performed on the fused features through the offset feature output network to obtain offset features.
  • Key points are extracted through the above classification features to obtain the target key points. Based on the target key points, the corresponding target offset attributes are indexed in the offset feature. Based on the target key points, target height and width attributes, and target rotation angle attributes, the sample rotation is obtained.
  • the detection result of the target the detection result of the sample rotation target corresponds to the sample detection frame.
  • the classification features and the auxiliary features are coupled, so that the trained target detection model can output more accurate auxiliary features.
  • the sample image in the step of inputting the sample image into the target detection model and obtaining the sample detection frame corresponding to the sample rotation target, can be processed through the target detection model to obtain the sample feature map corresponding to the sample image, and based on the sample
  • the height and width of the feature map are used to construct a matrix grid for the sample feature map.
  • the sample feature image includes the classification feature map corresponding to the sample image, the height-width feature map and the rotation angle feature map; a height-width feature map is established for each grid point in the height-width feature map.
  • the index of the wide attribute, and an index of the rotation angle attribute are established for each grid point in the rotation angle feature map; based on each grid point and its corresponding index attribute in the sample feature map, the sample detection corresponding to the sample rotation target is obtained frame.
  • the sample image can be processed through the target detection model to obtain the sample feature map corresponding to the sample image.
  • the above sample feature map includes a classification feature map, a height and width feature map, and a rotation angle feature map. In a possible embodiment, it may also include an offset feature map.
  • the classification feature map, height-width feature map, rotation angle feature map and offset feature map have the same height H and width W.
  • the sample detection frame (x 1 , y 1 , w 1 , h 1 , ⁇ 1 ) is obtained.
  • the trained target detection model can output more accurate auxiliary features.
  • the annotation box includes annotation key points.
  • the step of adjusting the network parameters of the target detection model based on the metric distance between the sample function distribution and the annotation function distribution it can be calculated based on the sample key points of the classification feature map.
  • the first loss between the sample key points and the labeled key points convert the metric distance into the second loss through a preset conversion function; adjust the network parameters of the target detection model based on the first loss and the second loss.
  • the above-mentioned sample key points are the center points in the sample detection frame, and the calculation method of the above-mentioned artwork sample key points is the same as the calculation method of the above-mentioned target key points, both of which are calculated through the maximum pooling kernel.
  • the above-mentioned key points are obtained for labeling. Calculating the first loss between the sample key points and the labeled key points can be calculated through the first loss function.
  • the first loss function is as shown in the following formula:
  • loss hm is the first loss
  • hm pred is the prediction result of the center point of the sample detection frame
  • hm target is the true label corresponding to the center point of the labeling frame.
  • Guassian_focal_loss() is the first loss function.
  • the above-mentioned second loss is obtained through a preset conversion function.
  • the above-mentioned preset conversion function may be a nonlinear function.
  • the above-mentioned preset rotation function may be as shown in the following formula:
  • loss rbbox is the second loss
  • d is the metric distance between the sample function distribution and the label function distribution
  • is an adjustable constant
  • the total loss is as follows:
  • Loss losshm + ⁇ lossrbbox
  • is a priori coefficient, which can be adjusted based on prior knowledge during the training process.
  • the auxiliary features also include bias features.
  • the loss between the bias attribute corresponding to the sample detection frame and the label bias attribute corresponding to the label frame can be calculated as the third loss.
  • the third loss can be calculated by The third loss function is calculated, and the third loss function can be expressed as follows:
  • loss offset is the third loss
  • hm pred is the prediction result of the offset attribute of the sample detection frame
  • hm target is the real label corresponding to the offset attribute of the labeling frame.
  • Smooth-L1() is the third loss function.
  • the total loss of the classification features and the auxiliary features can be obtained, and the total loss is as shown in the following formula:
  • Loss loss hm + ⁇ 1 loss offset + ⁇ 2 loss rbbox
  • ⁇ 1 is the first a priori coefficient
  • ⁇ 2 is the second a priori coefficient. Both the first a priori coefficient and the second a priori coefficient can be adjusted according to the a priori knowledge during the training process.
  • the trained target detection model can output more accurate auxiliary features.
  • the coordinates (i, j) of the obtained target key points can be indexed on the height and width features.
  • the target width attribute w and target height attribute h of the corresponding position are indexed to the x-direction offset dx and y-direction offset dy of the corresponding position on the offset feature, and the target rotation angle of the corresponding position is indexed on the rotation angle feature; wait for
  • the value is scaled to the original image scale of the image to be detected (x′, y′, w′, h′), so the final detection result of the rotated target to be detected is expressed in the form of (x′, y′,
  • the detection results of the rotating target to be detected are obtained.
  • the anchor there is no need to design the anchor, so there is no need to Non-maximum suppression is required to improve the detection accuracy of rotating target detection, thereby improving the detection performance of rotating target detection.
  • target detection method provided by the embodiment of the present invention can be applied to devices such as smart phones, computers, and servers that can perform target detection.
  • Figure 3 is a schematic structural diagram of a target detection device provided by an embodiment of the present invention. As shown in Figure 3, the device includes:
  • the first acquisition module 301 is used to acquire an image to be detected, where the image to be detected includes a rotating target to be detected;
  • the extraction module 302 is used to perform feature extraction on the image to be detected through a trained target detection model to obtain the classification features and auxiliary features of the rotating target to be detected;
  • the processing module 303 is configured to perform auxiliary processing on the classification feature based on the auxiliary feature to obtain the detection result of the rotating target to be detected.
  • the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network.
  • the extraction module 302 includes:
  • the first extraction sub-module is used to extract features of the image to be detected through the feature extraction network to obtain multi-scale features of the image to be detected;
  • a fusion submodule used to perform feature fusion on the multi-scale features through the feature fusion network to obtain the fusion features of the image to be detected;
  • the first prediction sub-module is used to predict the fusion feature through the classification feature output network to obtain the classification features of the rotating target to be detected.
  • the classification features include feature channels, and different categories of rotating targets to be detected correspond to for different characteristic channels;
  • the second prediction sub-module is used to predict the fusion feature through the auxiliary feature output network to obtain the auxiliary features of the rotating target to be detected.
  • auxiliary features include height and width features and rotation angle features.
  • the processing module 303 includes:
  • the second extraction sub-module is used to extract key points from the classification features to obtain the target key points of the rotating target to be detected;
  • An index submodule configured to index the corresponding target height and width attributes in the height and width features based on the target key points, and index the corresponding target rotation angle attributes in the rotation angle features;
  • the first processing submodule is used to obtain the detection result of the rotating target to be detected based on the target key points, the target height and width attributes, and the target rotation angle attribute.
  • the device also includes:
  • An acquisition module used to acquire a training data set, the training data set includes a sample image and an annotation frame, the sample image includes a sample rotation target, and the annotation frame is an annotation frame of the sample rotation target;
  • a training module is used to obtain a target detection model, and train the target detection model through the training data set to obtain a trained target detection model.
  • the target detection model includes a feature extraction network, a feature fusion network, and a classification feature output. network, height and width feature output network and rotation angle feature output network.
  • the training module includes:
  • the second processing submodule is used to input the sample image into the target detection model to obtain the sample detection frame corresponding to the sample rotation target;
  • Encoding submodule used to encode the sample detection frame and the labeling frame through preset coding functions, respectively, to obtain the sample function distribution corresponding to the sample detection frame and the labeling function distribution corresponding to the labeling frame.
  • An adjustment submodule configured to adjust network parameters of the target detection model according to the metric distance between the sample function distribution and the label function distribution;
  • the iterative submodule is used to iterate the network parameter adjustment process of the target detection model until the target detection model converges or reaches a preset number of iterations to obtain a trained target detection model.
  • the second processing sub-module includes:
  • a first processing unit configured to process the sample image through the target detection model, obtain a sample feature map corresponding to the sample image, and construct the sample feature map according to the height and width of the sample feature map.
  • Matrix grid, the sample feature image includes a classification feature map, a height-width feature map and a rotation angle feature map corresponding to the sample image;
  • An index creation unit configured to establish an index of the height and width attributes for each grid point in the height-width feature map, and to establish an index of the rotation angle attribute for each grid point in the rotation angle feature map;
  • the second processing unit is configured to obtain a sample detection frame corresponding to the sample rotation target based on each grid point in the sample feature map and its corresponding index attribute.
  • annotation box includes annotation key points
  • adjustment sub-module includes:
  • a calculation unit configured to calculate the first loss between the sample key points and the labeled key points according to the sample key points of the classification feature map
  • a conversion unit configured to convert the metric distance into a second loss through a preset conversion function
  • An adjustment unit configured to adjust network parameters of the target detection model based on the first loss and the second loss.
  • target detection device provided by the embodiment of the present invention can be applied to smart phones, computers, servers and other equipment that can perform target detection.
  • the target detection device provided by the embodiment of the present invention can implement each process implemented by the target detection method in the above method embodiment, and can achieve the same beneficial effects. To avoid repetition, they will not be repeated here.
  • Figure 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in Figure 4, it includes: a memory 402, a processor 401, and an electronic device stored in the memory 402 and available in the processor.
  • a computer program for a target detection method running on 401 wherein:
  • the processor 401 is used to call the computer program stored in the memory 402 and perform the following steps:
  • the trained target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and an auxiliary feature output network.
  • the processor 401 executes the trained target detection model to detect the target to be detected.
  • Feature extraction is performed on the image to obtain the classification features and auxiliary features of the rotating target to be detected, including:
  • Feature fusion is performed on the multi-scale features through the feature fusion network to obtain the fusion features of the image to be detected;
  • the fusion features are predicted by the classification feature output network to obtain the classification features of the rotation target to be detected, the classification features include feature channels, and different categories of rotation targets to be detected correspond to different feature channels;
  • the fusion feature is predicted by the auxiliary feature output network to obtain the auxiliary feature of the rotating target to be detected.
  • the auxiliary features include height and width features and rotation angle features.
  • the processor 401 performs auxiliary processing on the classification features based on the auxiliary features to obtain the detection result of the rotating target to be detected, including :
  • a detection result of the rotating target to be detected is obtained.
  • the method executed by the processor 401 further includes:
  • the training data set includes a sample image and an annotation frame
  • the sample image includes a sample rotation target
  • the annotation frame is an annotation frame of the sample rotation target
  • the target detection model includes a feature extraction network, a feature fusion network, a classification feature output network, and a height and width feature. output network and rotation angle feature output network.
  • the processor 401 executes the acquisition of the target detection model, and trains the target detection model through the training data set to obtain a trained target detection model, including:
  • the sample detection frame and the labeling frame are respectively encoded by preset encoding functions to obtain respectively the sample function distribution corresponding to the sample detection frame and the labeling function distribution corresponding to the labeling frame;
  • the network parameter adjustment process of the target detection model is iterated until the target detection model converges or reaches a preset number of iterations, and a trained target detection model is obtained.
  • the processor 401 executes the input of the sample image into the target detection model to obtain the sample detection frame corresponding to the sample rotation target, including:
  • the sample image is processed by the target detection model to obtain a sample feature map corresponding to the sample image, and a matrix grid is constructed for the sample feature map according to the height and width of the sample feature map.
  • the feature image includes a classification feature map, a height-width feature map and a rotation angle feature map corresponding to the sample image;
  • Each grid point in the height and width feature map establishes an index of the height and width attributes, and each grid point in the rotation angle feature map establishes an index of the rotation angle attribute;
  • annotation box includes annotation key points
  • processor 401 performs network parameter adjustment of the target detection model according to the metric distance between the sample function distribution and the annotation function distribution, include:
  • network parameters of the target detection model are adjusted.
  • the electronic device provided by the embodiment of the present invention can implement each process implemented by the target detection method in the above method embodiment, and can achieve the same beneficial effects. To avoid repetition, they will not be repeated here.
  • Embodiments of the present invention also provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the target detection method or the application-end target detection method provided by the embodiment of the present invention is implemented.
  • Each process can achieve the same technical effect. To avoid duplication, it will not be described again here.
  • the program can be stored in a computer-readable storage medium.
  • the program can be stored in a computer-readable storage medium.
  • the process may include the processes of the embodiments of each of the above methods.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, referred to as RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种目标检测方法,方法包括:获取待检测图像(101),所述待检测图像包括待检测旋转目标;通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征(102);基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果(103)。通过提取待检测旋转目标的分类特征和辅助特征,利用辅助特征来对分类特征进行辅助处理,从而得到待检测旋转目标的检测结果,不需要对anchor进行设计,因此也不需要进行非极大值抑制,提高了旋转目标检测的检测准确率,进而提高了旋转目标检测的检测性能。

Description

目标检测方法、装置、电子设备及存储介质
本申请要求于2022年7月12日提交中国专利局,申请号为202210819568.0、发明名称为“目标检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人工智能领域,尤其涉及一种目标检测方法、装置、电子设备及存储介质。
背景技术
旋转目标检测指的是将具有旋转方向的目标检测出来,也就是需要检测目标的中心点、宽高、角度。在俯视图的目标检测中比较常见,如遥感图像目标检测、航拍图像目标检测等。当前的旋转目标检测往往是基于anchor锚点和非极大值抑制的目标检测算法,由于图像中旋转目标的大小和位置不一样,基于anchor的目标检测算法有一些固有缺点,比如想要检测所有大小和位置的旋转目标,anchor的设计会非常复杂,要去设计不同的比例,不同的尺寸,而且anchor的比例和尺寸设计不适当,还会影响后续的非极大值抑制,造成误差累积,从而影响目标检测模型的检测准确率。因此,现有的旋转目标检测算法具有检测准确率不高的问题。
发明内容
本发明实施例提供一种目标检测方法,旨在解决现有目标检测过程中,旋转目标检测算法具有检测准确率不高的问题。通过提取待检测旋转目标的分类特征和辅助特征,利用辅助特征来对分类特征进行辅助处理,从而得到待检测旋转目标的检测结果,不需要对anchor进行设计,因此也不需要进行非极大值抑制,提高了旋转目标检测的检测准确率,进而提高了旋转目标检测的检测性能。
第一方面,本发明实施例提供一种目标检测方法,所述目标检测方法用于旋转目标的检测,所述方法包括:
获取待检测图像,所述待检测图像包括待检测旋转目标;
通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;
基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。
可选的,所述训练好的目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、辅助特征输出网络,所述通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征,包括:
通过所述特征提取网络对所述待检测图像进行特征提取,得到所述待检测图像的多尺度特征;
通过所述特征融合网络对所述多尺度特征进行特征融合,得到所述待检测图像的融合特征;
通过所述分类特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的分类特征,所述分类特征包括特征通道,不同类别的待检测旋转目标对应于不同的特征通道;
通过所述辅助特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的辅助特征。
可选的,所述辅助特征包括高宽特征以及旋转角度特征,所述基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果,包括:
对所述分类特征进行关键点提取,得到所述待检测旋转目标的目标关键点;
基于所述目标关键点,在所述高宽特征中索引对应的目标高宽属性,以及在所述旋转角度特征中索引对应的目标旋转角度属性;
基于所述目标关键点、所述目标高宽属性以及所述目标旋转角度属性,得到所述待检测旋转目标的检测结果。
可选的,在所述通过训练好的目标检测模型对所述待检测图像进行特征 提取,得到所述待检测旋转目标的分类特征和辅助特征之前,所述方法还包括:
获取训练数据集,所述训练数据集中包括样本图像以及标注框,所述样本图像中包括样本旋转目标,所述标注框为所述样本旋转目标的标注框;
获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,所述目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、高宽特征输出网络以及旋转角度特征输出网络。
可选的,所述获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,包括:
将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框;
将所述样本检测框与所述标注框分别通过预设的编码函数进行编码,分别得到所述样本检测框对应的样本函数分布,以及所述标注框对应的标注函数分布;
根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整;
对所述目标检测模型的网络参数调整过程进行迭代,直到所述目标检测模型收敛或达到预设的迭代次数,得到训练好的目标检测模型。
可选的,所述将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框,包括:
通过所述目标检测模型对所述样本图像进行处理,得到所述样本图像对应的样本特征图,并根据所述样本特征图的高宽,对所述样本特征图构建矩阵网格,所述样本特征图像包括所述样本图像对应分类特征图、高宽特征图与旋转角度特征图;
在所述高宽特征图中每个网格点建立一个高宽属性的索引,以及在所述旋转角度特征图中每个网格点建立一个旋转角度属性的索引;
根据所述样本特征图中每个网格点及其对应的索引属性,得到所述样本旋转目标对应的样本检测框。
可选的,所述标注框内包括标注关键点,所述根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整, 包括:
根据分类特征图的样本关键点,计算所述样本关键点与所述标注关键点之间的第一损失;
通过预设的转换函数将所述度量距离转换为第二损失;
基于所述第一损失与第二损失,对所述目标检测模型进行网络参数调整。
第二方面,本发明实施例提供一种目标检测装置,所述装置包括:
第一获取模块,用于获取待检测图像,所述待检测图像包括待检测旋转目标;
提取模块,用于通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;
处理模块,用于基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。
第三方面,本发明实施例提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明实施例提供的目标检测方法中的步骤。
第四方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现发明实施例提供的目标检测方法中的步骤。
本发明实施例中,获取待检测图像,所述待检测图像包括待检测旋转目标;通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。通过提取待检测旋转目标的分类特征和辅助特征,利用辅助特征来对分类特征进行辅助处理,从而得到待检测旋转目标的检测结果,不需要对anchor进行设计,因此也不需要进行非极大值抑制,提高了旋转目标检测的检测准确率,进而提高了旋转目标检测的检测性能。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面 描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种目标检测方法的流程图;
图2是本发明实施例提供的一种目标检测模型的结构示意图;
图3是本发明实施例提供的一种目标检测装置的结构示意图;
图4是本发明实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
请参见图1,图1是本发明实施例提供的一种目标检测方法的流程图,如图1所示,该目标检测方法用于旋转目标的检测,该目标检测方法包括以下步骤:
101、获取待检测图像。
在本发明实施例中,待检测图像包括待检测旋转目标。上述待检测图像可以是侧面图像、俯视图像、仰视图像等,侧面图像可以是从目标侧面拍摄到的图像,俯视图像可以是从目标上方拍摄到的图像,仰视图像可以是从目标下方拍摄到的图像。
上述待检测旋转目标可以是人员、车辆、飞机、建筑、物品等具有实体的目标。
102、通过训练好的目标检测模型对待检测图像进行特征提取,得到待检测旋转目标的分类特征和辅助特征。
在本发明实施例中,可以将待检测图像输入到训练好的目标检测模型中,通过目标检测模型对待检测图像进行特征提取,得到待检测旋转目标的分类特征和辅助特征。
在一种可能的实施例中,在将待检测图像输入到训练好的目标检测模型之前,可以对待检测图像进行预处理,上述预处理可以包括图像像素归一化和宽高缩放到H 0×W 0大小,其中,H 0和W 0的大小为32的整数倍。
上述待检测旋转目标的分类特征中包含了待检测旋转目标的类别信息,比如待检测旋转目标的类别为人员、车辆、飞机、建筑、物品等。上述待检测旋转目标的辅助特征可以包含待检测旋转目标高宽以及旋转角度等属性信息。
具体的,训练好的目标检测模型包括分类特征分支结构以及辅助特征分支结构,训练好的目标检测模型可以先从待检测图像中提取共用特征,通过分类特征分支结构输出对应的分类特征,通过辅助特征分支结构输出对应的辅助特征。分类特征分支结构与辅助特征分支结构具有不同的结构参数。
进一步的,上述目标检测模型可以是基于深度卷积神经网络进行构建的,对深度卷积神经网络进行训练后,得到训练好的目标检测模型。具体可以收集样本图像,样本图像中包括样本旋转目标,样本旋转目标可以是人员、车辆、飞机、建筑、物品等目标,对样本图像中的样本旋转目标进行标注,得到对应的标签数据,标注包括对应于分类特征的类别标注,以及对应于辅助特征的属性标注,属性标注可以包括高宽标注和旋转角度标注。通过样本图像和对应的标签数据对深度卷积神经网络进行训练,使得深度卷积神经网络学习到旋转目标的分类特征和旋转目标的辅助特征进行输出,训练完成得到训练好的目标检测模型。
103、基于辅助特征对分类特征进行辅助处理,得到待检测旋转目标的检测结果。
在本发明实施例中,待检测旋转目标的辅助特征可以包含待检测旋转目标高宽以及旋转角度等属性信息,可以在辅助特征索引对应的属性信息添加到分类特征中,从而得到待检测旋转目标的检测结果。
上述检测结果可以包括待检测旋转目标的位置、类别、高宽以及旋转角度等。
本发明实施例中,获取待检测图像,所述待检测图像包括待检测旋转目标;通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。通过提取待检测旋转目标的分类特征和辅助特征,利用辅助特征来对分类特征进行辅助处理,从而得到待检测旋转目标的检测结果,不需要对anchor进行设计,因此也不需 要进行非极大值抑制,提高了旋转目标检测的检测准确率,进而提高了旋转目标检测的检测性能。
可选的,训练好的目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、辅助特征输出网络,在通过训练好的目标检测模型对待检测图像进行特征提取,得到待检测旋转目标的分类特征和辅助特征的步骤中,可以通过特征提取网络对待检测图像进行特征提取,得到待检测图像的多尺度特征;通过特征融合网络对多尺度特征进行特征融合,得到待检测图像的融合特征;通过分类特征输出网络对融合特征进行预测,得到待检测旋转目标的分类特征,分类特征包括特征通道,不同类别的待检测旋转目标对应于不同的特征通道;通过辅助特征输出网络对融合特征进行预测,得到待检测旋转目标的辅助特征。
在本发明实施例中,上述特征提取网络可以是backbone网络构成,比如VGG19,ResNet、MobileNet等,本发明实施例不对特征提取网络做任何限制。上述特征提取网络可以提高待检测图像在不同尺度下的特征,得到待检测图像的多尺度特征。需要说明的是,在特征提取网络中,由于下采样层的存在,随着特征提取网络的计算深度越深,提取到的特征尺度越小。
上述特征融合网络可以包括上采样层和融合层,通过上采样层对尺寸较小的特征进行上采样,使得尺寸较小的特征被上采样为尺寸较大的特征,进而通过融合层将上采样后的特征与相同尺寸的特征进行融合。具体的,特征融合网络从特征提取网络的不同阶段特征中提取多尺度特征,并逐一将小尺度的特征进行2倍上采样并与从特征提取网络的相同尺度的特征融合,最后将高尺度的融合特征输出到预测网络。
上述分类特征输出网络与辅助特征输出网络也可以称为预测网络,通过分类特征输出网络输出对应的分类特征,输出的分类特征中可以包括多个特征通道,每个特征通道对应一个待检测旋转目标的一个类别。
具体的,上述分类特征输出网络可以是基于CenterNetR的分类特征输出网络,可以将融合特征输入到基于CenterNetR的分类特征输出网络,通过CenterNetR预测待检测旋转目标的中心点热力图作为分类特征,不同类别的中心点热力图被分布在不同的特征通道中,因此,可以由特征通道确定待检测旋转目标的类别。
上述辅助特征也可以是基于CenterNetR的属性特征输出网络,可以将融合特征输入到基于CenterNetR的属性特征输出网络,通过CenterNetR预测每个中心点对应的属性信息作为辅助信息,其中上述辅助特征的尺度分辨率与分类特征的尺度分辨率相同。上述辅助特征可以包含待检测旋转目标高宽以及旋转角度等属性信息。在辅助特征中,每个位置点均对应一组属性信息,通过分类特征的中心点位置,可以索引到对应位置的属性信息。
可选的,辅助特征包括高宽特征以及旋转角度特征,基于辅助特征对分类特征进行辅助处理,得到待检测旋转目标的检测结果的步骤中,可以对分类特征进行关键点提取,得到待检测旋转目标的目标关键点;基于目标关键点,在高宽特征中索引对应的目标高宽属性,以及在旋转角度特征中索引对应的目标旋转角度属性;基于目标关键点、目标高宽属性以及目标旋转角度属性,得到待检测旋转目标的检测结果。
在本发明实施例中,上述特征包括高宽特征以及旋转角度特征,上述高宽特征对应于待检测旋转目标的高宽属性,上述旋转角度特征对应于待检测旋转目标的旋转角度属性。分类特征中每个位置点对应于一个高宽特征中的位置点以及一个转角度特征中的位置点,因此,可以根据分类特征中的目标关键点,在高宽特征中对应位置点索引到高宽属性作为目标,在旋转角度特征中对应位置点索引到旋转角度属性。
具体的,高宽特征以及旋转角度特征均与分类特征具有相同的尺度分辨率,上述分类特征的高为H,宽为W,同样的,高宽特征的高为H,宽为W,旋转角度特征的高为H,宽为W。分类特征可以是中心点热力图,热力图的中心点为热力值最高的位置点,该中心点也可以作为目标关键点。具体的,在得到分类特征后,可以通过一个n*n的最大池化核对分类特征进行采样,并根据预设的置信度阈值得到高置信度的关键点作为目标关键点。其中,n小于H,且n小于W。最大池化核的作用为从在热力图n*n区域中采样最大值。以n=3进行举例,可以在热力图3*3区域中采样出热力值最高的值作为最大池化核的采样值。
在得到目标关键点(i,j)后,其中,(i,j)表示目标关键点在分类特征中的位置坐标。可以根据目标关键点(i,j)在高宽特征中索引到对应的目标高宽属性(w,h),其中w表示待检测旋转目标在分类特征中的宽度, h表示待检测旋转目标在分类特征中的高度。同时,可根据目标关键点(i,j)在高宽特征中索引到对应的目标旋转角度属性θ,其中θ表示待检测旋转目标在分类特征中的旋转角度。
根据目标关键点(i,j)、目标高宽属性(w,h)以及目标旋转角度属性θ,可以得到待检测旋转目标的检测结果(i,j,w,h,θ)。
在一种可能的实施例中,上述辅助特征还可以包含偏置特征,上述偏置特征用于描述目标关键点的偏移量。具体的,上述偏置特征可以在目标检测模型中增加偏置特征输出网络,通过偏置特征输出网络对融合特征进行预测得到。偏置特征与分类特征也具有相同的尺度分辨率,偏置特征的高为H,宽为W。分类特征中每个位置点对应于一个偏置特征中的位置点。可以根据目标关键点(i,j)在偏置特征索引到对应的目标偏置属性(dx,dy),则待检测旋转目标的最终位置为(x,y),其中,x=i+dx,y=j+dy。结合目标高宽属性(w,h)以及目标旋转角度属性θ,可以得到待检测旋转目标的检测结果(x,y,w,h,θ)。通过目标关键点在偏置特征索引得到对应目标偏置属性,可以更准确的确定待检测旋转目标的位置。
可选的,在通过训练好的目标检测模型对待检测图像进行特征提取,得到待检测旋转目标的分类特征和辅助特征之前,还可以获取训练数据集,训练数据集中包括样本图像以及标注框,样本图像中包括样本旋转目标,标注框为样本旋转目标的标注框;获取目标检测模型,并通过训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、高宽特征输出网络以及旋转角度特征输出网络。
在本发明实施例中,在通过训练好的目标检测模型对待检测图像进行特征提取之前,可以对目标检测模型进行训练。可以收集包含样本旋转目标的样本图像进行标注,得到从而训练数据集,样本旋转目标与待检测旋转目标具有相同的类别。可以通过专家人员在样本图像中对旋转目标进行标注,得到标注框,标框包括样本旋转目标的类别、标注框位置、标注框高宽以及标注框旋转角度,在一种可能的实施例中,辅助特征包括偏置特征,则标注框目标偏置,此时,目标检测模型还包括偏置特征输出网络。需要说明的是,分类特征输出网络、高宽特征输出网络、旋转角度特征输出网络以及偏置特 征输出网络均为独立且并行的分支网络。
具体的,请参考图2,图2是本发明实施例提供的一种目标检测模型的结构示意图,如图2所示,在目标检测模型中,特征提取网络的输出与特征融合网络的输入连接,特征融合网络的输出分别与分类特征输出网络、高宽特征输出网络、旋转角度特征输出网络以及偏置特征输出网络的输入连接。
通过训练数据集对目标检测模型进行训练,在训练过程中,迭代调整特征提取网络、特征融合网络、分类特征输出网络、高宽特征输出网络、旋转角度特征输出网络以及偏置特征输出网络中的网络参数,直到目标检测模型收敛或达到预设的迭代次数,得到训练好的目标检测模型。
可选的,在获取目标检测模型,并通过训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型的步骤中,可以将样本图像输入到目标检测模型,得到样本旋转目标对应的样本检测框;将样本检测框与标注框分别通过预设的编码函数进行编码,分别得到样本检测框对应的样本函数分布,以及标注框对应的标注函数分布;根据样本函数分布与标注函数分布之间的度量距离,对目标检测模型进行网络参数调整;对目标检测模型的网络参数调整过程进行迭代,直到目标检测模型收敛或达到预设的迭代次数,得到训练好的目标检测模型。
在本发明实施例中,可以在训练过程中,将样本图像输入到目标检测模型中,得到目标检测模型输出的样本检测框。样本检测框可以通过目标检测模型中分类特征输出网络、高宽特征输出网络、旋转角度特征输出网络以及偏置特征输出网络输出的预测结果得到。在得到样本检测框后,可以计算样本检测框与标注检测框之间的损失,通过样本检测框与标注检测框之间的损失进行反向传播,调整目标检测模型中各个网络的网络参数,迭代上述过程,完成对目标检测模型的训练。
进一步的,为了提高辅助特征的辅助效果,进而提高目标检测模型的检测准确性,本发明实施例将样本检测框和标注框分别通过编码函数进行编码,通过编码函数将样本检测框中分类特征与辅助特征进行编码耦合,具体的,通过编码函数将标注框位置与宽高特征、旋转角度特征进行编码耦合,从而使目标检测模型学习到这种耦合联系,使得训练好的目标检测模型输出更准确的辅助特征。
更进一步的,上述编码函数可以是非线性分布函数,比如可以是二维高斯分布函数,具体的,可以如下述式子所示:
μ=(x,y) T
Figure PCTCN2022143514-appb-000001
其中,上述(x,y,w,h,θ)为检测框的表达形式,(x,y)为检测框的中心点坐标,(w,h)为检测框的宽和高,θ为检测框中旋转目标的旋转角度。通过上述式子,将检测框(x,y,w,h,θ)编码为二维高斯分布形式(μ,Σ),具体的,μ表示转换二维高斯分布的均值,Σ表示转换二维高斯分布的协方差。样本检测框和标注框的编码都可以通过上述式子进行。
在对样本检测框进行编码后,得到样本函数分布(μ 1,Σ 1);在对标注框进行编码后,得到标注函数分布(μ 2,Σ 2)。计算样本函数分布(μ 1,Σ 1)与标注函数分布(μ 2,Σ 2)之间的度量距离。在样本图像为正样本的情况下,度量距离越小,则说明样本检测框与标注框越相似,检测结果越符合真实结果;度量距离越大,则说明样本检测框与标注框越不相似,检测结果越不符合真实结果。在样本图像为负样本的情况下,度量距离越小,则说明样本检测框与标注框越不相似,检测结果越符合真实结果;度量距离越大,则说明样本检测框与标注框越相似,检测结果越不符合真实结果。
上述度量距离的计算可以采用Wasserstein距离和KL散度等计算方法进行,本发明实施例优选Wasserstein距离来计算样本函数分布(μ 1,Σ 1)与标注函数分布(μ 2,Σ 2)之间的度量距离,具体可以如下述式子所示:
Figure PCTCN2022143514-appb-000002
其中,d为样本函数分布(μ 1,Σ 1)与标注函数分布(μ 2,Σ 2)之间的度量距离,Tr()函数表示计算出来的矩阵的迹。
在训练过程中,通过对检测检测框和标注框进行编码,使得分类特征与辅助特征通过编码函数进行耦合,提高目标检测模型对于辅助特征的学习能力,使得训练好的目标检测模型能够提取出更准确的辅助特征。
在一种可能的实施例中,目标检测模型中还包括偏置特征输出网络,通 过特征提取网络提取样本图像的多尺度特征,通过特征融合网络将样本图像的多尺度特征进行融合。通过分类特征输出网络对融合后的特征进行预测处理,得到分类特征;通过高宽特征输出网络对融合后的特征进行预测处理,得到高宽特征;通过旋转角度特征输出网络对融合后的特征进行预测处理,得到旋转角度特征;通过偏置特征输出网络对融合后的特征进行预测处理,得到偏置特征。通过上述分类特征进行关键点提取,得到目标关键点,基于目标关键点,在偏置特征中索引对应的目标偏置属性,基于目标关键点、目标高宽属性以及目标旋转角度属性,得到样本旋转目标的检测结果,样本旋转目标的检测结果对应样本检测框。
通过在目标检测模型中增加辅助特征对应的输出网络,将分类特征与辅助特征之间进行耦合,使得训练好的目标检测模型可以输出更准确的辅助特征。
可选的,在将样本图像输入到目标检测模型,得到样本旋转目标对应的样本检测框的步骤中,可以通过目标检测模型对样本图像进行处理,得到样本图像对应的样本特征图,并根据样本特征图的高宽,对样本特征图构建矩阵网格,样本特征图像包括样本图像对应分类特征图、高宽特征图与旋转角度特征图;在高宽特征图中每个网格点建立一个高宽属性的索引,以及在旋转角度特征图中每个网格点建立一个旋转角度属性的索引;根据样本特征图中每个网格点及其对应的索引属性,得到样本旋转目标对应的样本检测框。
在本发明实施例中,可以通过目标检测模型对样本图像进行处理,得到样本图像对应的样本特征图。上述样本特征图中包括分类特征图、高宽特征图、以及旋转角度特征图,在一种可能的实施例中,还可以包括偏置特征图。分类特征图、高宽特征图、旋转角度特征图以及偏置特征图具有相同的高H和宽W。
可以根据样本特征图的高H和宽W,为样本特征图创建矩阵网格,具体可以通过meshgrid为样本特征图创建矩阵网格。对于一个网格点(i 0,j 0),在高宽特征图中可以建立对应于该网格点(i 0,j 0)的检测框宽度属性w 0和检测框高度h 0的索引关系,以及在旋转角度特征图也可以建立对应于该网格点(i 0,j 0)的旋转角度属性θ 0索引关系,在偏置特征图也可以建立对应于该网格点(i 0,j 0)的x方向偏置属性dx 0和y方向偏置属性dy 0的索引关系。 对于样本检测框的中心点(i 1,j 1),则可以索引到样本高宽属性(w 1,h 1)、样本旋转角度属性θ 1、样本偏置属性(dx 1,dy 1),从而得到样本检测框(x 1,y 1,w 1,h 1,θ 1)。
通过在目标检测模型中增加辅助特征对应的输出网络,建立分类特征与辅助特征之间的索引关系,使得训练好的目标检测模型可以输出更准确的辅助特征。
可选的,标注框内包括标注关键点,在根据样本函数分布与标注函数分布之间的度量距离,对目标检测模型进行网络参数调整的步骤中,可以根据分类特征图的样本关键点,计算样本关键点与标注关键点之间的第一损失;通过预设的转换函数将所述度量距离转换为第二损失;基于第一损失与第二损失,对目标检测模型进行网络参数调整。
在本发明实施例中,上述样本关键点为样本检测框中的中心点,上艺术品样本关键点的计算方式与上述目标关键点的计算方式相同,均是通过最大池化核进行计算得到。上述标注关键点为标注获取,计算样本关键点与标注关键点之间的第一损失,可以是通过第一损失函数进行计算,第一损失函数如下述式子所示:
loss hm=Guassian_focal_loss(hm pred,hm target)
其中,loss hm为第一损失,hm pred为样本检测框中心点的预测结果,hm target为对应标注框中心点的真实标签。Guassian_focal_loss()为第一损失函数。
上述第二损失通过预设的转换函数得到,上述预设的转换函数可以是非线性函数,具体的,上述预设的轮换函数可以如下述式子所示:
Figure PCTCN2022143514-appb-000003
其中,loss rbbox为第二损失,d为样本函数分布与标注函数分布之间的度量距离,τ为可调整常数。
基于第一损失和第二损失,可以得到分类特征与辅助特征的总损失,该总损失如下述式子所示:
Loss=loss hm+λloss rbbox
上述λ为先验系数,可以在训练过程中根据先验知识进行调整。
在一种可能的实施例中,辅助特征还包括偏置特征,可以计算样本检测 框对应的偏置属性与标注框对应的标注偏置属性之间的损失作为第三损失,第三损失可以通过第三损失函数进行计算得到,第三损失函数可以如下述式子所示:
loss offset=Smooth-L1(offset pred,offset target)
其中,loss offset为第三损失,hm pred为样本检测框偏置属性的预测结果,hm target为对应标注框偏置属性的真实标签。Smooth-L1()为第三损失函数。
在该实施例中,基于第一损失、第二损失和第三损失,可以得到分类特征与辅助特征的总损失,该总损失如下述式子所示:
Loss=loss hm1loss offset2loss rbbox
其中,上述λ 1为第一先验系数,λ 2为第二先验系数,第一先验系数和第二先验系数均可以在训练过程中根据先验知识进行调整。
在本发明实施例中,通过增加辅助特征对应的输出网络,通过对应的第一损失和第二损失对辅助特征进行训练,使得训练好的目标检测模型可以输出更准确的辅助特征。
可选的,在基于辅助特征对分类特征进行辅助处理,得到所述待检测旋转目标的检测结果的步骤中,可以在获得的目标关键点的坐标(i,j)在高宽特征上索引出相应位置的目标宽度属性w和目标高度属性h,在偏置特征上索引出相应位置的x方向偏置dx和y方向偏置dy,在旋转角度特征上索引出相应位置的目标旋转角度;待检测旋转目标可以表示为(x,y,w,h,θ)的形式,其中x=i+dx和y=j+dy;将所有检测结果的(x,y,w,h)四个元素值缩放到待检测图像的原图尺度(x′,y′,w′,h′),因此是最终的待检测旋转目标的检测结果表示形式为(x′,y′,w′,h′,θ)。
本发明实施例中,通过提取待检测旋转目标的分类特征和辅助特征,利用辅助特征来对分类特征进行辅助处理,从而得到待检测旋转目标的检测结果,不需要对anchor进行设计,因此也不需要进行非极大值抑制,提高了旋转目标检测的检测准确率,进而提高了旋转目标检测的检测性能。
需要说明的是,本发明实施例提供的目标检测方法可以应用于可以进行目标检测的智能手机、电脑、服务器等设备。
可选的,请参见图3,图3是本发明实施例提供的一种目标检测装置的结构示意图,如图3所示,所述装置包括:
第一获取模块301,用于获取待检测图像,所述待检测图像包括待检测 旋转目标;
提取模块302,用于通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;
处理模块303,用于基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。
可选的,所述训练好的目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、辅助特征输出网络,所述提取模块302包括:
第一提取子模块,用于通过所述特征提取网络对所述待检测图像进行特征提取,得到所述待检测图像的多尺度特征;
融合子模块,用于通过所述特征融合网络对所述多尺度特征进行特征融合,得到所述待检测图像的融合特征;
第一预测子模块,用于通过所述分类特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的分类特征,所述分类特征包括特征通道,不同类别的待检测旋转目标对应于不同的特征通道;
第二预测子模块,用于通过所述辅助特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的辅助特征。
可选的,所述辅助特征包括高宽特征以及旋转角度特征,所述处理模块303包括:
第二提取子模块,用于对所述分类特征进行关键点提取,得到所述待检测旋转目标的目标关键点;
索引子模块,用于基于所述目标关键点,在所述高宽特征中索引对应的目标高宽属性,以及在所述旋转角度特征中索引对应的目标旋转角度属性;
第一处理子模块,用于基于所述目标关键点、所述目标高宽属性以及所述目标旋转角度属性,得到所述待检测旋转目标的检测结果。
可选的,所述装置还包括:
获取模块,用于获取训练数据集,所述训练数据集中包括样本图像以及标注框,所述样本图像中包括样本旋转目标,所述标注框为所述样本旋转目标的标注框;
训练模块,用于获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,所述目标检测模型包括特征 提取网络、特征融合网络、分类特征输出网络、高宽特征输出网络以及旋转角度特征输出网络。
可选的,所述训练模块包括:
第二处理子模块,用于将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框;
编码子模块,用于将所述样本检测框与所述标注框分别通过预设的编码函数进行编码,分别得到所述样本检测框对应的样本函数分布,以及所述标注框对应的标注函数分布;
调整子模块,用于根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整;
迭代子模块,用于对所述目标检测模型的网络参数调整过程进行迭代,直到所述目标检测模型收敛或达到预设的迭代次数,得到训练好的目标检测模型。
可选的,所述第二处理子模块包括:
第一处理单元,用于通过所述目标检测模型对所述样本图像进行处理,得到所述样本图像对应的样本特征图,并根据所述样本特征图的高宽,对所述样本特征图构建矩阵网格,所述样本特征图像包括所述样本图像对应分类特征图、高宽特征图与旋转角度特征图;
索引建立单元,用于在所述高宽特征图中每个网格点建立一个高宽属性的索引,以及在所述旋转角度特征图中每个网格点建立一个旋转角度属性的索引;
第二处理单元,用于根据所述样本特征图中每个网格点及其对应的索引属性,得到所述样本旋转目标对应的样本检测框。
可选的,所述标注框内包括标注关键点,所述调整子模块包括:
计算单元,用于根据分类特征图的样本关键点,计算所述样本关键点与所述标注关键点之间的第一损失;
转换单元,用于通过预设的转换函数将所述度量距离转换为第二损失;
调整单元,用于基于所述第一损失与第二损失,对所述目标检测模型进行网络参数调整。
需要说明的是,本发明实施例提供的目标检测装置可以应用于可以进行 目标检测的智能手机、电脑、服务器等设备。
本发明实施例提供的目标检测装置能够实现上述方法实施例中目标检测方法实现的各个过程,且可以达到相同的有益效果。为避免重复,这里不再赘述。
参见图4,图4是本发明实施例提供的一种电子设备的结构示意图,如图4所示,包括:存储器402、处理器401及存储在所述存储器402上并可在所述处理器401上运行的目标检测方法的计算机程序,其中:
处理器401用于调用存储器402存储的计算机程序,执行如下步骤:
获取待检测图像,所述待检测图像包括待检测旋转目标;
通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;
基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。
可选的,所述训练好的目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、辅助特征输出网络,处理器401执行的所述通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征,包括:
通过所述特征提取网络对所述待检测图像进行特征提取,得到所述待检测图像的多尺度特征;
通过所述特征融合网络对所述多尺度特征进行特征融合,得到所述待检测图像的融合特征;
通过所述分类特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的分类特征,所述分类特征包括特征通道,不同类别的待检测旋转目标对应于不同的特征通道;
通过所述辅助特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的辅助特征。
可选的,所述辅助特征包括高宽特征以及旋转角度特征,处理器401执行的所述基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果,包括:
对所述分类特征进行关键点提取,得到所述待检测旋转目标的目标关键 点;
基于所述目标关键点,在所述高宽特征中索引对应的目标高宽属性,以及在所述旋转角度特征中索引对应的目标旋转角度属性;
基于所述目标关键点、所述目标高宽属性以及所述目标旋转角度属性,得到所述待检测旋转目标的检测结果。
可选的,在所述通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征之前,处理器401执行的所述方法还包括:
获取训练数据集,所述训练数据集中包括样本图像以及标注框,所述样本图像中包括样本旋转目标,所述标注框为所述样本旋转目标的标注框;
获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,所述目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、高宽特征输出网络以及旋转角度特征输出网络。
可选的,处理器401执行的所述获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,包括:
将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框;
将所述样本检测框与所述标注框分别通过预设的编码函数进行编码,分别得到所述样本检测框对应的样本函数分布,以及所述标注框对应的标注函数分布;
根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整;
对所述目标检测模型的网络参数调整过程进行迭代,直到所述目标检测模型收敛或达到预设的迭代次数,得到训练好的目标检测模型。
可选的,处理器401执行的所述将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框,包括:
通过所述目标检测模型对所述样本图像进行处理,得到所述样本图像对应的样本特征图,并根据所述样本特征图的高宽,对所述样本特征图构建矩阵网格,所述样本特征图像包括所述样本图像对应分类特征图、高宽特征图与旋转角度特征图;
在所述高宽特征图中每个网格点建立一个高宽属性的索引,以及在所述旋转角度特征图中每个网格点建立一个旋转角度属性的索引;
根据所述样本特征图中每个网格点及其对应的索引属性,得到所述样本旋转目标对应的样本检测框。
可选的,所述标注框内包括标注关键点,处理器401执行的所述根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整,包括:
根据分类特征图的样本关键点,计算所述样本关键点与所述标注关键点之间的第一损失;
通过预设的转换函数将所述度量距离转换为第二损失;
基于所述第一损失与第二损失,对所述目标检测模型进行网络参数调整。
本发明实施例提供的电子设备能够实现上述方法实施例中目标检测方法实现的各个过程,且可以达到相同的有益效果。为避免重复,这里不再赘述。
本发明实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例提供的目标检测方法或应用端目标检测方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存取存储器(Random Access Memory,简称RAM)等。
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。

Claims (10)

  1. 一种目标检测方法,其特征在于,所述目标检测方法用于旋转目标的检测,包括以下步骤:
    获取待检测图像,所述待检测图像包括待检测旋转目标;
    通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;
    基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。
  2. 如权利要求1所述的目标检测方法,其特征在于,所述训练好的目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、辅助特征输出网络,所述通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征,包括:
    通过所述特征提取网络对所述待检测图像进行特征提取,得到所述待检测图像的多尺度特征;
    通过所述特征融合网络对所述多尺度特征进行特征融合,得到所述待检测图像的融合特征;
    通过所述分类特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的分类特征,所述分类特征包括特征通道,不同类别的待检测旋转目标对应于不同的特征通道;
    通过所述辅助特征输出网络对所述融合特征进行预测,得到所述待检测旋转目标的辅助特征。
  3. 如权利要求2所述的目标检测方法,其特征在于,所述辅助特征包括高宽特征以及旋转角度特征,所述基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果,包括:
    对所述分类特征进行关键点提取,得到所述待检测旋转目标的目标关键点;
    基于所述目标关键点,在所述高宽特征中索引对应的目标高宽属性,以及在所述旋转角度特征中索引对应的目标旋转角度属性;
    基于所述目标关键点、所述目标高宽属性以及所述目标旋转角度属性,得到所述待检测旋转目标的检测结果。
  4. 如权利要求3所述的目标检测方法,其特征在于,在所述通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征之前,所述方法还包括:
    获取训练数据集,所述训练数据集中包括样本图像以及标注框,所述样本图像中包括样本旋转目标,所述标注框为所述样本旋转目标的标注框;
    获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,所述目标检测模型包括特征提取网络、特征融合网络、分类特征输出网络、高宽特征输出网络以及旋转角度特征输出网络。
  5. 如权利要求4所述的目标检测方法,其特征在于,所述获取目标检测模型,并通过所述训练数据集对的目标检测模型进行训练,得到训练好的目标检测模型,包括:
    将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框;
    将所述样本检测框与所述标注框分别通过预设的编码函数进行编码,分别得到所述样本检测框对应的样本函数分布,以及所述标注框对应的标注函数分布;
    根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整;
    对所述目标检测模型的网络参数调整过程进行迭代,直到所述目标检测模型收敛或达到预设的迭代次数,得到训练好的目标检测模型。
  6. 如权利要求5所述的目标检测方法,其特征在于,所述将所述样本图像输入到所述目标检测模型,得到所述样本旋转目标对应的样本检测框,包括:
    通过所述目标检测模型对所述样本图像进行处理,得到所述样本图像对应的样本特征图,并根据所述样本特征图的高宽,对所述样本特征图构建矩阵网格,所述样本特征图像包括所述样本图像对应分类特征图、高宽特征图 与旋转角度特征图;
    在所述高宽特征图中每个网格点建立一个高宽属性的索引,以及在所述旋转角度特征图中每个网格点建立一个旋转角度属性的索引;
    根据所述样本特征图中每个网格点及其对应的索引属性,得到所述样本旋转目标对应的样本检测框。
  7. 如权利要求6所述的目标检测方法,其特征在于,所述标注框内包括标注关键点,所述根据所述样本函数分布与所述标注函数分布之间的度量距离,对所述目标检测模型进行网络参数调整,包括:
    根据分类特征图的样本关键点,计算所述样本关键点与所述标注关键点之间的第一损失;
    通过预设的转换函数将所述度量距离转换为第二损失;
    基于所述第一损失与第二损失,对所述目标检测模型进行网络参数调整。
  8. 一种目标检测装置,其特征在于,所述装置包括:
    第一获取模块,用于获取待检测图像,所述待检测图像包括待检测旋转目标;
    提取模块,用于通过训练好的目标检测模型对所述待检测图像进行特征提取,得到所述待检测旋转目标的分类特征和辅助特征;
    处理模块,用于基于所述辅助特征对所述分类特征进行辅助处理,得到所述待检测旋转目标的检测结果。
  9. 一种电子设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的目标检测方法中的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的目标检测方法中的步骤。
PCT/CN2022/143514 2022-07-12 2022-12-29 目标检测方法、装置、电子设备及存储介质 WO2024011873A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210819568.0A CN115311553A (zh) 2022-07-12 2022-07-12 目标检测方法、装置、电子设备及存储介质
CN202210819568.0 2022-07-12

Publications (1)

Publication Number Publication Date
WO2024011873A1 true WO2024011873A1 (zh) 2024-01-18

Family

ID=83856438

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/143514 WO2024011873A1 (zh) 2022-07-12 2022-12-29 目标检测方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN115311553A (zh)
WO (1) WO2024011873A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311553A (zh) * 2022-07-12 2022-11-08 青岛云天励飞科技有限公司 目标检测方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191566A (zh) * 2019-12-26 2020-05-22 西北工业大学 基于像素分类的光学遥感图像多目标检测方法
CN111931877A (zh) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 目标检测方法、装置、设备及存储介质
CN113420648A (zh) * 2021-06-22 2021-09-21 深圳市华汉伟业科技有限公司 一种具有旋转适应性的目标检测方法及系统
WO2022142783A1 (zh) * 2020-12-29 2022-07-07 华为云计算技术有限公司 一种图像处理方法以及相关设备
CN115311553A (zh) * 2022-07-12 2022-11-08 青岛云天励飞科技有限公司 目标检测方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191566A (zh) * 2019-12-26 2020-05-22 西北工业大学 基于像素分类的光学遥感图像多目标检测方法
CN111931877A (zh) * 2020-10-12 2020-11-13 腾讯科技(深圳)有限公司 目标检测方法、装置、设备及存储介质
WO2022142783A1 (zh) * 2020-12-29 2022-07-07 华为云计算技术有限公司 一种图像处理方法以及相关设备
CN113420648A (zh) * 2021-06-22 2021-09-21 深圳市华汉伟业科技有限公司 一种具有旋转适应性的目标检测方法及系统
CN115311553A (zh) * 2022-07-12 2022-11-08 青岛云天励飞科技有限公司 目标检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN115311553A (zh) 2022-11-08

Similar Documents

Publication Publication Date Title
CN110443143B (zh) 多分支卷积神经网络融合的遥感图像场景分类方法
US11488308B2 (en) Three-dimensional object detection method and system based on weighted channel features of a point cloud
CN108108764B (zh) 一种基于随机森林的视觉slam回环检测方法
US20230186056A1 (en) Grabbing detection method based on rp-resnet
WO2020108362A1 (zh) 人体姿态检测方法、装置、设备及存储介质
CN111259940B (zh) 一种基于空间注意力地图的目标检测方法
CN111079847B (zh) 一种基于深度学习的遥感影像自动标注方法
CN111950453A (zh) 一种基于选择性注意力机制的任意形状文本识别方法
CN110443258B (zh) 文字检测方法、装置、电子设备及存储介质
CN111815665B (zh) 基于深度信息与尺度感知信息的单张图像人群计数方法
WO2022218396A1 (zh) 图像处理方法、装置和计算机可读存储介质
US20220398737A1 (en) Medical image segmentation method based on u-network
CN112818969A (zh) 一种基于知识蒸馏的人脸姿态估计方法及系统
CN113313703A (zh) 基于深度学习图像识别的无人机输电线巡检方法
CN115937626B (zh) 基于实例分割的半虚拟数据集自动生成方法
CN112819008B (zh) 实例检测网络的优化方法、装置、介质及电子设备
WO2024011873A1 (zh) 目标检测方法、装置、电子设备及存储介质
CN111523586A (zh) 一种基于噪声可知的全网络监督目标检测方法
CN117541652A (zh) 一种基于深度lk光流法与d-prosac采样策略的动态slam方法
CN117788810A (zh) 一种无监督语义分割的学习系统
WO2024011853A1 (zh) 人体图像质量检测方法、装置、电子设备及存储介质
CN108154107B (zh) 一种确定遥感图像归属的场景类别的方法
CN116047418A (zh) 基于小样本的多模态雷达有源欺骗干扰识别方法
CN113705489B (zh) 基于先验区域知识指导的遥感影像细粒度飞机识别方法
CN111914751B (zh) 一种图像人群密度识别检测方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950979

Country of ref document: EP

Kind code of ref document: A1