CN117746193B

CN117746193B - Label optimization method and device, storage medium and electronic equipment

Info

Publication number: CN117746193B
Application number: CN202410194170.1A
Authority: CN
Inventors: 窦宝成; 马嘉; 兰昆艳; 任祖杰; 施航
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2024-02-21
Filing date: 2024-02-21
Publication date: 2024-05-10
Anticipated expiration: 2044-02-21
Also published as: CN117746193A

Abstract

The specification discloses a tag optimization method, a tag optimization device, a storage medium and electronic equipment. The label optimization method comprises the following steps: obtaining each basic label of a sample image, inputting the sample image and each basic label into a pre-trained fine-grained remote sensing target detection model, obtaining a prediction label of the sample image and confidence corresponding to the prediction label, determining a quality weight corresponding to each basic label according to the consistency degree between the basic label and the prediction label in a matched label set corresponding to the basic label and the confidence degree of each prediction label contained in the matched label set corresponding to the basic label, screening out the basic label to be optimized from each basic label according to the quality weight, optimizing the basic label to be optimized according to a matched label set, obtaining optimized labels, and training the fine-grained remote sensing target detection model according to the optimized labels and other basic labels.

Description

Label optimization method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of remote sensing image processing technologies, and in particular, to a tag optimization method, a device, a storage medium, and an electronic apparatus.

Background

Along with the rapid development of satellite remote sensing technology and the improvement of the performance of hardware equipment such as an imaging platform, the resolution of a remote sensing image acquired by the satellite remote sensing technology is higher and higher, and the satellite remote sensing technology also has the advantages of wide detection range, rapid data acquisition and the like, so that the method for carrying out fine-grained target detection by the remote sensing image has important value in the fields of emergency disaster reduction, resource detection and the like.

In general, when fine-grained target detection is performed based on a remote sensing image, due to the fact that noise exists in a data tag adopted by the fine-grained target detection model during training, accuracy of a detection result of fine-grained target detection on the remote sensing image through the fine-grained target detection model is low.

Therefore, how to improve the accuracy of the detection result of fine-grained target detection for the remote sensing image is a problem to be solved urgently.

Disclosure of Invention

The present disclosure provides a tag optimizing method, apparatus, storage medium, and electronic device, so as to partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a label optimization method for training a fine-grained remote sensing target detection model, the method comprising:

Acquiring each basic label of the sample image, wherein the basic label comprises: the category of the target object contained in the sample image, and the angular point position of the target frame corresponding to the target object in the sample image;

Inputting the sample image and each basic label into a pre-trained fine-granularity remote sensing target detection model to obtain a prediction label of the sample image and a confidence coefficient corresponding to the prediction label;

determining a matched tag set corresponding to each basic tag from each prediction tag according to the position relation between each corner position of the target frame in each prediction tag and each corner position of the target frame in each basic tag;

for each basic label, determining the quality weight corresponding to the basic label according to the consistency degree between the basic label and the predicted label in the matched label set corresponding to the basic label and the confidence degree of each predicted label contained in the matched label set corresponding to the basic label;

And screening basic labels to be optimized from the basic labels according to the quality weight corresponding to each basic label, optimizing the basic labels to be optimized according to the prediction labels in the matched label set corresponding to the basic labels to be optimized, obtaining optimized labels, and training the fine-granularity remote sensing target detection model according to the optimized labels and other basic labels.

Optionally, acquiring each basic label of the sample image specifically includes:

acquiring original labels of a sample image, wherein the original labels comprise: the category of the corresponding object, the angular point position of the object frame of the corresponding object;

inputting the sample image and each original label into a pre-trained fine-grained remote sensing target detection model to obtain each pseudo label and the confidence coefficient corresponding to each pseudo label;

And optimizing the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence degree corresponding to each pseudo label to obtain the basic label of the sample image.

Optionally, optimizing the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence corresponding to each pseudo label specifically includes:

For each original label, determining an overlapping degree value between the target frame corresponding to the target object contained in the original label and the target frame corresponding to the target object contained in each pseudo label according to each corner position of the target frame corresponding to the target object contained in the original label and each corner position of the target frame corresponding to the target object contained in each pseudo label;

Determining the consistency degree between the original label and each pseudo label according to the overlap degree value between the target frame corresponding to the target object contained in the original label and the target frame corresponding to the target object contained in each pseudo label;

and optimizing the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence degree corresponding to each pseudo label.

Optionally, optimizing the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence degree corresponding to each pseudo label to obtain a basic label of the sample image, which specifically includes:

for each original label, determining a pseudo label matched with the original label according to the consistency degree between the original label and each pseudo label;

Judging whether the confidence coefficient corresponding to the pseudo tag matched with the original tag exceeds a preset confidence coefficient threshold value or not;

If yes, optimizing the original label according to the pseudo label matched with the original label to obtain the basic label of the sample image.

Optionally, the fine-grained remote sensing target detection model is trained in advance, and specifically comprises the following steps:

performing pretraining on the fine-grained remote sensing target detection model for a plurality of times through each basic label to obtain a fine-grained remote sensing target detection model after pretraining each time; wherein,

Randomly dividing each basic label into a plurality of groups aiming at each pre-training to obtain each basic label set in the pre-training;

For each basic tag set, other basic tag sets are used as training sets to pre-train the fine-grained remote sensing target detection model, and a pre-trained fine-grained remote sensing target detection model corresponding to the basic tag set is obtained;

inputting the sample image and each basic label into a pre-trained fine-grained remote sensing target detection model to obtain a prediction label of the sample image and a confidence coefficient corresponding to the prediction label, wherein the method specifically comprises the following steps of:

and inputting the sample image and the basic tag set into a pre-trained fine-grained remote sensing target detection model corresponding to the basic tag set to obtain a prediction tag of the sample image and a confidence coefficient corresponding to the prediction tag.

Optionally, for each basic tag, determining a quality weight corresponding to the basic tag according to a degree of coincidence between the basic tag and a predicted tag in a matched tag set corresponding to the basic tag, and a confidence level of each predicted tag included in the matched tag set corresponding to the basic tag, which specifically includes:

For each basic label, determining the label category of each prediction label in the matched label set corresponding to the basic label according to the consistency degree between the basic label and the prediction label in the matched label set corresponding to the basic label; wherein,

For each prediction tag in the matching tag set corresponding to the basic tag, if the category of the target object in the basic tag is consistent with the category of the target object in the prediction tag, determining the marking category of the prediction tag as a first marking category, and if the category of the target object in the basic tag is inconsistent with the category of the target object in the prediction tag, determining the marking category of the prediction tag as a second marking category;

And determining the quality weight corresponding to the basic label according to the label category of the predicted label in the matched label set corresponding to the basic label and the confidence level of each predicted label contained in the matched label set corresponding to the basic label.

Optionally, determining the quality weight corresponding to the basic tag according to the label category of the predicted tag in the matched tag set corresponding to the basic tag and the confidence level of each predicted tag included in the matched tag set corresponding to the basic tag specifically includes:

normalizing the confidence coefficient of each prediction label contained in the matched label set corresponding to the basic label to obtain a normalized confidence coefficient corresponding to the basic label;

And inputting the label category of the prediction label in the matched label set corresponding to the basic label and the normalized confidence coefficient corresponding to the basic label into a preset confidence learning model so as to determine the quality weight corresponding to the basic label through the confidence learning model.

Optionally, training the fine-grained remote sensing target detection model according to the optimized label and other basic labels specifically includes:

Inputting the optimized labels or other basic labels into the fine-granularity remote sensing target detection model aiming at each optimized label or other basic labels to obtain the identification result of the target objects contained in the optimized labels or other basic labels;

Determining cross entropy loss corresponding to the optimized label or other basic labels according to deviation between the identification result of the target object contained in the optimized label or other basic labels and the actual classification result of the target object contained in the optimized label or other basic labels, which are output by the fine-granularity remote sensing target detection model;

determining the weight of the cross entropy loss corresponding to the optimized label or other basic labels according to the normalized frequency corresponding to the actual classification result of the target object contained in the optimized label or other basic labels;

And according to the weight, fusing the cross entropy loss corresponding to each optimized label and other basic labels to obtain fusion loss, and training the fine-granularity remote sensing target detection model by taking the minimum fusion loss as an optimized target to obtain the trained fine-granularity remote sensing target detection model.

Optionally, the fine-grained remote sensing target detection model includes: the feature extraction network, a two-stage detector, wherein the two-stage detector comprises a receptive field area learner RRol Learner, receptive field area deformation RRoI Warping.

The present specification provides a label optimizing apparatus including:

The acquisition module is used for acquiring each basic label of the sample image, and the basic labels comprise: the category of the target object contained in the sample image, and the angular point position of the target frame corresponding to the target object in the sample image;

the confidence coefficient determining module is used for inputting the sample image and each basic label into a pre-trained fine-grained remote sensing target detection model to obtain a prediction label of the sample image and a confidence coefficient corresponding to the prediction label;

The clustering module is used for determining a matched tag set corresponding to each basic tag from each prediction tag according to the position relation between each corner position of the target frame in each prediction tag and each corner position of the target frame in each basic tag;

The weight determining module is used for determining the quality weight corresponding to each basic label according to the consistency degree between the basic label and the predicted labels in the matched label set corresponding to the basic label and the confidence degree of each predicted label contained in the matched label set corresponding to the basic label;

the optimizing module is used for screening basic labels to be optimized from the basic labels according to the quality weight corresponding to each basic label, optimizing the basic labels to be optimized according to the prediction labels in the matched label set corresponding to the basic labels to be optimized, obtaining optimized labels, and training the fine-granularity remote sensing target detection model according to the optimized labels and other basic labels.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

In the label optimization method provided by the specification, a first target image and a second target image corresponding to the first target image are acquired, the first target image is an optical remote sensing image, the second target image is a synthetic aperture radar SAR image, image detection is carried out on the first target image, an image area with partial image information missing is determined from the first target image and is used as a target area, image adjustment is carried out on the second target image according to style characteristics of the first target image, an adjusted second target image is obtained, the first target image in the target area is replaced with the second target image in the target area, and the second target image is fused into the first target image, so that the optimized first target image is obtained.

According to the method, the basic labels of the sample images for training the fine-granularity remote sensing target detection model can be optimized, so that noise contained in the basic labels of the sample images for training the fine-granularity remote sensing target detection model is reduced, and the accuracy of detection results of fine-granularity target detection on the remote sensing images can be improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a schematic flow chart of a label optimization method provided in the present specification;

FIG. 2 is a schematic diagram of a fine-grained remote sensing target detection model provided in the present specification;

FIG. 3 is a schematic diagram of the optimization process of the base label provided in the present specification;

FIG. 4 is a schematic diagram of a label optimizing apparatus provided herein;

Fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a label optimization method provided in the present specification, including the following steps:

s101: acquiring each basic label of the sample image, wherein the basic label comprises: the sample image comprises a sample frame, a sample image and a sample frame, wherein the sample image comprises a category of a target object contained in the sample image, and the sample image comprises a corner point position of a target frame corresponding to the target object.

In this specification, the service platform may obtain a historically collected remote sensing image as a sample image and obtain each original label of the sample image, where the original labels include: the category of the corresponding object and the angular point position of the object frame of the corresponding object.

For example: if the sample image is an image of a plurality of warships on the sea surface, each warship can serve as a target object, and each target object is provided with at least one original label correspondingly.

Further, the service platform can input the sample image and each original label into a pre-trained fine-grained remote sensing target detection model to obtain each pseudo label and the confidence coefficient corresponding to each pseudo label, and further optimize the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence coefficient corresponding to each pseudo label to obtain the basic label of the sample image.

Specifically, for each original label and each pseudo label, whether the consistency degree between the original label and the pseudo label exceeds a specified threshold is judged, if so, the pseudo label is determined to be the pseudo label matched with the original label.

Further, a pseudo tag with the highest consistency degree with the original tag is selected from the pseudo tags matched with the original tag to be used as the pseudo tag matched with the original tag, whether the confidence degree of the pseudo tag matched with the original tag exceeds a preset confidence degree threshold is judged, if yes, the category of the target object contained in the sample image contained in the original tag and the corner position of the target frame corresponding to the target object in the sample image are replaced by the category of the target object contained in the sample image contained in the pseudo tag matched with the original tag and the corner position of the target frame corresponding to the target object in the sample image, and the basic tag of the sample image is obtained.

The method for determining the degree of coincidence between each pseudo tag and each original tag may be that each pseudo tag and each original tag are matched by a Non-maximum suppression (Non-Maximum Suppression, NMS) method, and an intersection (Intersection over Union threshold, IOU) threshold exceeding a union threshold between each pseudo tag and each original tag is determined, wherein for any one pseudo tag and any one original tag, if the IOU threshold between the pseudo tag and the original tag is higher, the degree of coincidence between the pseudo tag and the original tag is higher.

The method for determining the IOU threshold between each pseudo tag and each original tag may be that, according to each corner position of the target frame corresponding to the target object included in the original tag and each corner position of the target frame corresponding to the target object included in each pseudo tag, an overlapping degree value between the target frame corresponding to the target object included in the original tag and the target frame corresponding to the target object included in each pseudo tag is determined and used as the IOU threshold between the original tag and each pseudo tag.

In an actual application scene, the service platform may further divide each original label of the sample image into several groups (for example, five groups) randomly, regarding each group of original labels, regarding the group of original labels as a verification set, regarding other groups of original labels as a training set, pre-training the fine-grained remote sensing target detection model through the other groups of original labels to obtain a pre-trained fine-grained remote sensing target detection model corresponding to the group of original labels, regarding each original label included in the group of original labels, inputting the original label into the pre-trained fine-grained remote sensing target detection model corresponding to the group of original labels, and obtaining each pseudo label and a confidence level corresponding to each pseudo label.

In the present specification, the execution body for implementing the tag optimization method may refer to a server or other specified devices provided in a service platform, or may be a terminal device such as a notebook computer, a desktop computer, or a mobile phone, and the tag optimization method provided in the present specification will be described below by taking the server as an example of the execution body.

S102: and inputting the sample image and each basic label into a pre-trained fine-granularity remote sensing target detection model to obtain a prediction label of the sample image and a confidence coefficient corresponding to the prediction label.

In the specification, the server may input the sample image and each basic label into a pre-trained fine-grained remote sensing target detection model, so as to obtain a prediction label of the sample image and a confidence coefficient corresponding to the prediction label.

The training method of the fine-grained remote sensing target detection model may be that the fine-grained remote sensing target detection model is pre-trained for a plurality of times through each basic label, so as to obtain a fine-grained remote sensing target detection model after each pre-training, wherein each basic label is randomly divided into a plurality of groups (for example, five groups) for each pre-training, each basic label set in the pre-training is obtained, each basic label set is used as a training set for pre-training the fine-grained remote sensing target detection model, so as to obtain a fine-grained remote sensing target detection model after the pre-training corresponding to the basic label set, a sample image and the basic label set are input into the fine-grained remote sensing target detection model after the pre-training corresponding to the basic label set, so as to obtain a prediction label of the sample image and a confidence level corresponding to the prediction label, and the fine-grained remote sensing target detection model is specifically shown in fig. 2.

Fig. 2 is a schematic diagram of a fine-grained remote sensing target detection model provided in the present specification.

As can be seen in conjunction with fig. 2, the fine-grained remote sensing target detection model includes: a feature extraction network, a two-stage detector, wherein the two-stage detector comprises a receptive field area learner RRol Learner, a receptive field area distortion RRol Warping, wherein RRol Learner is used primarily to learn the transition from HRoIs to RRoIs, in other words, RRol Learner is used primarily to learn the extraction of the target frame from the sample image, corresponding to the target object, where the target frame has an at least partial tilt angle. For example: the inclination angle corresponding to the object extracted from the sample image isIs a rectangular target frame of (a). RRol Warping is used here to extract rotation invariant features from RRol for detection and identification of sample images.

The feature extraction network can be SWIN-transducer, the SWIN-transducer comprises sliding Window operation, has hierarchical design, can extract local features contained in a sample image through different windows, and can control the overall calculation amount of a model.

Further, in order to increase robustness of the network to fine-grained classification, the above-mentioned feature extraction network is further provided with a proportion of random path discarding of network features.

S103: and determining a matched tag set corresponding to each basic tag from the predictive tags according to the position relation between each corner position of the target frame in each predictive tag and each corner position of the target frame in each basic tag.

Further, the server may determine, from the prediction tags, a matching tag set corresponding to each basic tag according to a positional relationship between each corner position of the target frame in each prediction tag and each corner position of the target frame in each basic tag, as shown in fig. 3.

Fig. 3 is a schematic diagram of the optimization process of the basic tag provided in the present specification.

As can be seen from fig. 3, the server may match each prediction tag with each base tag by using an NMS method according to a positional relationship between each corner position of the target frame in each prediction tag and each corner position of the target frame in each base tag, determine an IOU threshold between each prediction tag and each base tag, and for each base tag, screen out prediction tags from each prediction tag that have an IOU threshold between them that exceeds a specified threshold, as a matched tag set corresponding to the base tag.

S104: and determining the quality weight corresponding to each basic label according to the consistency degree between the basic label and the predicted label in the matched label set corresponding to the basic label and the confidence degree of each predicted label contained in the matched label set corresponding to the basic label.

The server may determine, for each base label, a label category of each predictive label in the set of matched labels corresponding to the base label according to a degree of agreement between the base label and the predictive label in the set of matched labels corresponding to the base label. For each prediction tag in the matching tag set corresponding to the base tag, if the category of the target object in the base tag is consistent with the category of the target object in the prediction tag, determining the marking category of the prediction tag as a first marking category, and if the category of the target object in the base tag is inconsistent with the category of the target object in the prediction tag, determining the marking category of the prediction tag as a second marking category.

Further, the server may normalize the confidence coefficient of each prediction tag included in the matching tag set corresponding to the base tag to obtain a normalized confidence coefficient corresponding to the base tag, input the label category of the prediction tag in the matching tag set corresponding to the base tag, and input the normalized confidence coefficient corresponding to the base tag into a preset confidence learning model, so as to determine the quality weight corresponding to the base tag through the confidence learning model.

The method for determining the normalized confidence coefficient corresponding to the basic label by the server may be that the confidence coefficient of each prediction label included in the matched label set corresponding to the basic label is accumulated to obtain a confidence coefficient accumulated value corresponding to the basic label, and further the normalized confidence coefficient corresponding to the basic label may be obtained by dividing the confidence coefficient accumulated value corresponding to the basic label by the number of each prediction label included in the matched label set corresponding to the basic label.

S105: and screening basic labels to be optimized from the basic labels according to the quality weight corresponding to each basic label, optimizing the basic labels to be optimized according to the prediction labels in the matched label set corresponding to the basic labels to be optimized, obtaining optimized labels, and training the fine-granularity remote sensing target detection model according to the optimized labels and other basic labels.

In the specification, the server may screen out each basic label with a corresponding quality weight lower than a preset quality weight threshold from the basic labels according to the quality weight corresponding to each basic label, and optimize the basic label to be optimized according to each prediction label in the matched label set corresponding to the basic label to be optimized, so as to obtain an optimized label.

In an actual application scenario, in order to further improve the accuracy of the optimized label obtained by the method, the server may further use a label smoothing policy to perform label smoothing processing on the optimized label, where a smoothing coefficient may be 0.9.

Further, the server may input the optimized tag or other basic tag into the fine-grained remote sensing target detection model for each optimized tag or other basic tag, obtain a recognition result of the target object included in the optimized tag or other basic tag, and determine a cross entropy loss corresponding to the optimized tag or other basic tag according to a deviation between the recognition result of the target object included in the optimized tag or other basic tag and an actual classification result of the target object included in the optimized tag or other basic tag output by the fine-grained remote sensing target detection model.

Further, the server may determine the weight of the cross entropy loss corresponding to the optimized tag or other basic tag according to the normalized frequency corresponding to the actual classification result of the object included in the optimized tag or other basic tag, and specifically may refer to the following formula:

Logitadjust = logitpred + log(probtau + 1e-12)

In the above formula, logitadjust is the adjusted category logic, logitpred is the predicted category logic, prob is the normalized frequency of the category, tau is the estimated value of the category prior, and tau is defaults to 1.

Further, the server can fuse the cross entropy loss corresponding to each optimized label and other basic labels according to the weight to obtain fusion loss, and train the fine-grained remote sensing target detection model by taking the minimum fusion loss as an optimization target to obtain the trained fine-grained remote sensing target detection model.

From the above, it can be seen that the server can reduce noise contained in the label through two-stage label optimization, so as to avoid the occurrence of the situation that the accuracy of the detection result of the trained fine-grained remote sensing target detection model for fine-grained target detection of the remote sensing image is poor due to the possible long tail distribution problem in the training sample data (i.e., the unbalanced quantity of the sample data under different categories, most of the data is concentrated at the head (i.e., the most common part) and the minority of the extreme value is distributed at the tail (i.e., the rare part)).

The foregoing describes one or more label optimization methods according to the present disclosure, and based on the same concept, the present disclosure further provides a corresponding label optimization apparatus, as shown in fig. 4.

Fig. 4 is a schematic diagram of a label optimizing apparatus provided in the present specification, including:

An obtaining module 401, configured to obtain each base label of the sample image, where the base label includes: the category of the target object contained in the sample image, and the angular point position of the target frame corresponding to the target object in the sample image;

The confidence determining module 402 is configured to input the sample image and each basic label into a pre-trained fine-grained remote sensing target detection model, so as to obtain a prediction label of the sample image and a confidence corresponding to the prediction label;

the clustering module 403 is configured to determine, from each prediction tag, a matching tag set corresponding to each basic tag according to a positional relationship between each corner position of the target frame in each prediction tag and each corner position of the target frame in each basic tag;

The weight determining module 404 is configured to determine, for each base label, a quality weight corresponding to the base label according to a degree of coincidence between the base label and a predicted label in a matching label set corresponding to the base label, and a confidence level of each predicted label included in the matching label set corresponding to the base label;

And the optimizing module 405 is configured to screen out a basic label to be optimized from the basic labels according to the quality weight corresponding to each basic label, optimize the basic label to be optimized according to each prediction label in the matching label set corresponding to the basic label to be optimized, obtain an optimized label, and train the fine-grained remote sensing target detection model according to the optimized label and other basic labels.

Optionally, the acquiring module 401 is specifically configured to acquire each original label of the sample image, where the original label includes: the category of the corresponding object, the angular point position of the object frame of the corresponding object; inputting the sample image and each original label into a pre-trained fine-grained remote sensing target detection model to obtain each pseudo label and the confidence coefficient corresponding to each pseudo label; and optimizing the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence degree corresponding to each pseudo label to obtain the basic label of the sample image.

Optionally, the obtaining module 401 is specifically configured to determine, for each original tag, a value of an overlapping degree between a target frame corresponding to a target object included in the original tag and a target frame corresponding to a target object included in each pseudo tag according to each corner position of the target frame corresponding to the target object included in the original tag and each corner position of the target frame corresponding to the target object included in each pseudo tag; determining the consistency degree between the original label and each pseudo label according to the overlap degree value between the target frame corresponding to the target object contained in the original label and the target frame corresponding to the target object contained in each pseudo label; and optimizing the original label of the sample image according to the consistency degree between each pseudo label and each original label and the confidence degree corresponding to each pseudo label.

Optionally, the obtaining module 401 is specifically configured to determine, for each original tag, a pseudo tag that matches the original tag according to a degree of agreement between the original tag and each pseudo tag; judging whether the confidence coefficient corresponding to the pseudo tag matched with the original tag exceeds a preset confidence coefficient threshold value or not; if yes, optimizing the original label according to the pseudo label matched with the original label to obtain the basic label of the sample image.

Optionally, the apparatus further comprises: a pre-training module 406;

The pre-training module 406 is specifically configured to perform a plurality of pre-training on the fine-grained remote sensing target detection model through the base labels, so as to obtain a fine-grained remote sensing target detection model after each pre-training; the basic labels are randomly divided into a plurality of groups aiming at each pre-training to obtain basic label sets in the pre-training; for each basic tag set, other basic tag sets are used as training sets to pre-train the fine-grained remote sensing target detection model, and a pre-trained fine-grained remote sensing target detection model corresponding to the basic tag set is obtained; and inputting the sample image and the basic tag set into a pre-trained fine-grained remote sensing target detection model corresponding to the basic tag set to obtain a prediction tag of the sample image and a confidence coefficient corresponding to the prediction tag.

Optionally, the weight determining module 404 is specifically configured to determine, for each base label, a label category of each prediction label in the matching label set corresponding to the base label according to a degree of coincidence between the base label and the prediction label in the matching label set corresponding to the base label; for each prediction tag in the matching tag set corresponding to the basic tag, if the category of the target object in the basic tag is consistent with the category of the target object in the prediction tag, determining the marking category of the prediction tag as a first marking category, and if the category of the target object in the basic tag is inconsistent with the category of the target object in the prediction tag, determining the marking category of the prediction tag as a second marking category; and determining the quality weight corresponding to the basic label according to the label category of the predicted label in the matched label set corresponding to the basic label and the confidence level of each predicted label contained in the matched label set corresponding to the basic label.

Optionally, the weight determining module 404 is specifically configured to normalize the confidence coefficient of each prediction tag included in the matching tag set corresponding to the base tag, to obtain a normalized confidence coefficient corresponding to the base tag; and inputting the label category of the prediction label in the matched label set corresponding to the basic label and the normalized confidence coefficient corresponding to the basic label into a preset confidence learning model so as to determine the quality weight corresponding to the basic label through the confidence learning model.

Optionally, the optimizing module 405 is specifically configured to input, for each optimized tag or other basic tag, the optimized tag or other basic tag into the fine-grained remote sensing target detection model, so as to obtain a recognition result of a target object included in the optimized tag or other basic tag; determining cross entropy loss corresponding to the optimized label or other basic labels according to deviation between the identification result of the target object contained in the optimized label or other basic labels and the actual classification result of the target object contained in the optimized label or other basic labels, which are output by the fine-granularity remote sensing target detection model; determining the weight of the cross entropy loss corresponding to the optimized label or other basic labels according to the normalized frequency corresponding to the actual classification result of the target object contained in the optimized label or other basic labels; and according to the weight, fusing the cross entropy loss corresponding to each optimized label and other basic labels to obtain fusion loss, and training the fine-granularity remote sensing target detection model by taking the minimum fusion loss as an optimized target to obtain the trained fine-granularity remote sensing target detection model.

The present specification also provides a computer readable storage medium storing a computer program operable to perform a tag optimisation method as provided in figure 1 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the tag optimization method described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL（Advanced Boolean Expression Language）、AHDL（Altera Hardware Description Language）、Confluence、CUPL（Cornell University Programming Language）、HDCal、JHDL（Java Hardware Description Language）、Lava、Lola、MyHDL、PALASM、RHDL（Ruby Hardware Description Language）, and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of tag optimization for training a fine-grained remote sensing target detection model, the method comprising:

2. The method according to claim 1, wherein acquiring each base label of the sample image comprises:

3. The method according to claim 2, wherein optimizing the original label of the sample image according to the degree of agreement between each of the pseudo labels and each of the original labels and the confidence level corresponding to each of the pseudo labels, specifically comprises:

4. The method according to claim 2, wherein optimizing the original label of the sample image according to the degree of coincidence between each pseudo label and each original label and the confidence corresponding to each pseudo label, comprises:

5. The method of claim 1, wherein pre-training a fine-grained remote sensing target detection model specifically comprises:

6. The method of claim 1, wherein for each base label, determining the quality weight corresponding to the base label according to the degree of agreement between the base label and the predictive labels in the set of matched labels corresponding to the base label, and the confidence level of each predictive label included in the set of matched labels corresponding to the base label, specifically comprises:

7. The method of claim 6, wherein determining the quality weight corresponding to the base label according to the label class of the predictive label in the set of matched labels corresponding to the base label and the confidence level of each predictive label included in the set of matched labels corresponding to the base label, specifically comprises:

8. The method of claim 1, wherein training the fine-grained remote sensing target detection model based on the optimized tags and other base tags, comprises:

9. The method of any one of claims 1-8, wherein the fine-grained remote sensing target detection model comprises: the feature extraction network, a two-stage detector, wherein the two-stage detector comprises a receptive field area learner RRol Learner, receptive field area deformation RRoI Warping.

10. A label optimizing apparatus, comprising: