CN117523345A

CN117523345A - Target detection data balancing method and device

Info

Publication number: CN117523345A
Application number: CN202410024623.6A
Authority: CN
Inventors: 罗芳; 马佳星; 周莹静; 颜昆; 何芷馨; 罗妍婕
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-02-06
Anticipated expiration: 2044-01-08
Also published as: CN117523345B

Abstract

The invention provides a target detection data balancing method and device, comprising the following steps: analyzing the target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index of each initial image; carrying out data local amplification on the initial image according to the original labeling information to obtain local amplified images, and training a preset background-category association model according to the preset number of the local amplified images to obtain a target background-category association model; and carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state. The invention sets the background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain the balance target detection data set in the balance state, and solves the technical problem of data unbalance of the target detection data.

Description

Target detection data balancing method and device

Technical Field

The invention relates to the technical field of image data detection, in particular to a target detection data balancing method and device.

Background

Object detection is one of the core tasks in the field of computer vision, which aims to locate and classify objects of interest in images. The localization and classification of the object is focused in part on the characteristic information of the object itself and in part on the context information of the environment, which can help the model fit quickly. While in many target detection data sets, data imbalance is a core problem that can be categorized into category imbalance, scale imbalance and spatial imbalance by the cause of the imbalance.

Class imbalance, existing target detection datasets, due to differences in acquisition environments and manual labeling, generally suffer from an uneven distribution of the number of samples in different classes, which may lead to poor performance of the model over a few classes, as it does not have enough samples to learn the characteristics of these classes.

The scale imbalance is a large difference in size of the target objects in the data set. Such non-uniform scale distribution may cause problems for the object detection model when handling objects of different sizes. The model may more easily detect large objects while ignoring small objects, or may present false or missed detection when dealing with small objects.

Spatial unbalance, uneven distribution of target objects in different areas in the target detection dataset. In particular, some regions may contain more targets, while other regions may have little or no targets present. Such imbalance may result in the target detection model being over-fitted to certain regions and under-fitted to other regions during training.

Therefore, there is an urgent need to provide a method and apparatus for balancing target detection data, which solve the problems of class imbalance, scale imbalance and spatial imbalance of data in the target detection data set in the prior art, so as to cause a technical problem of poor effect when detecting an image.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a method and a device for balancing target detection data, which are used for solving the technical problems of poor effect when detecting images caused by the problems of unbalanced category, unbalanced scale and unbalanced space of data in the target detection data set in the prior art.

In one aspect, the present invention provides a target detection data balancing method, including:

acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

Analyzing the target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on an initial image under the condition of spatial unbalance;

carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain preset number of local amplified images, and training a preset background-category association model according to the preset number of local amplified images to obtain a target background-category association model;

and carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state.

In some possible implementations, the analyzing the target detection data set, setting a class expansion ratio, a scale ratio rule, and a target spatial distribution index corresponding to each initial image, includes:

analyzing the target detection data set to obtain class distribution conditions of all the examples, determining expansion probability according to differences among the number of examples of each class in the class distribution conditions, and further determining class expansion ratio according to the expansion probability;

Performing scale analysis on all original marked information in the target detection data set to obtain scale distribution conditions, and determining scale proportion rules according to the scale distribution conditions;

and determining a filling mask map according to the original labeling information corresponding to each initial image, and analyzing the filling mask map to obtain a target space distribution index corresponding to each initial image.

In some possible implementations, the analyzing the filling mask map to obtain the target spatial distribution index corresponding to each initial image includes:

dividing the filling mask image of each initial image to obtain grid areas consisting of a preset number of grids;

performing concentration calculation on each grid in the grid area to obtain target concentration;

and determining the region type corresponding to each grid according to the target density, and determining the target spatial distribution index of the initial image according to all the region types of the preset number of grids in the grid region.

In some possible implementations, the performing data local amplification on the preset number of initial images according to the original labeling information to obtain a preset number of local amplified images includes:

Performing instance screening on all instances in the filling mask graph of each initial image to obtain non-overlapped target instances to be amplified, and acquiring the target instances to be amplified, which contain background information, in each initial image to obtain a labeling information set corresponding to each initial image; the annotation information set comprises all target instances to be augmented and the corresponding original annotation information;

performing spatial gesture transformation on all the target instances to be amplified in the annotation information set, correspondingly adjusting original annotation information corresponding to each target instance to be amplified in the annotation information set, and obtaining a transformation annotation information set corresponding to each initial image;

and filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image.

In some possible implementations, the filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image includes:

Determining a first target instance to be amplified in all target instances to be amplified according to the transformation annotation information set of each initial image, and determining a preset number of placement positions in the corresponding initial image according to the size of the first target instance to be amplified; the placement position is positioned at the blank background;

calculating the first target instance to be amplified and each placement position respectively to obtain the similarity corresponding to each placement position;

determining the maximum similarity according to all the similarities, and filling the placement position corresponding to the maximum similarity according to the first target instance to be amplified and the adjusted original labeling information to obtain a new target instance;

updating the original annotation information in the transformation annotation information set according to the new target instance, and performing mask filling on the same region of the filling mask map to obtain a new initial image corresponding to each initial image;

and obtaining a local amplification image corresponding to each initial image according to the new initial image.

In some possible implementations, the obtaining the local amplification image corresponding to each initial image according to the new initial image includes:

Judging whether a transformation target instance to be amplified exists in the transformation labeling information set or not;

if yes, replacing the new initial image with an initial image, determining a second target instance to be amplified from the transformation annotation information set, and carrying out data local amplification on the initial image according to the second target instance to be amplified;

if not, determining the new initial image as a local amplification image, and obtaining the local amplification image corresponding to each initial image when the data local amplification of each initial image is completed.

In some possible implementations, training a preset background-class association model according to the preset number of local augmentation images to obtain a target background-class association model includes:

obtaining an extended background image corresponding to each initial image according to the background images of the preset number of placement positions of each target instance to be amplified of each initial image;

obtaining a background-target image data set according to all the expanded background images of the preset number of initial images, and dividing the background-target image data set to obtain a training set and a testing set;

And training the preset background-category association model according to the training set and the testing set to obtain a target background-category association model.

In some possible implementations, the obtaining the extended background image corresponding to each initial image according to the background images of the preset number of placement positions of each target instance to be amplified of each initial image includes:

dividing the background images of the preset number of placement positions of each target instance to be amplified of each initial image to obtain a preset number of background images corresponding to each target instance to be amplified;

storing the preset number of background images according to the category information in the adjusted original labeling information of each target instance to be amplified to obtain a category background image corresponding to each target instance to be amplified;

and according to the image quantity of each category in the background images of all categories of all the target instances to be amplified, carrying out overturn expansion on the background images of all categories to obtain expansion background images corresponding to each initial image.

In some possible implementations, the performing data global augmentation on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule, and the target spatial distribution index to obtain a balanced target detection data set in a balanced state includes:

Obtaining a preset number of target examples suitable for copy expansion according to the original labeling information of each initial image in the target detection data set;

controlling the category and scale proportion of the preset number of target examples according to the category expansion ratio and the scale proportion rule to obtain the preset number of control target examples, and determining the initial images with the target space distribution indexes smaller than a preset threshold value as initial images to be expanded, so as to obtain the preset number of initial images to be expanded;

obtaining a background area set which is not overlapped with the preset number of control target examples in each initial image to be expanded according to the scale proportion of the missing of each initial image to be expanded;

determining a preset number of regions to be expanded in the background region set according to the target space distribution index;

performing data global amplification on the preset number of areas to be amplified according to the target background-category association model to obtain amplified images after the amplification of each initial image to be amplified;

judging whether the target spatial distribution index of the amplified image is larger than an expected balance threshold value or not;

if not, continuing to perform data global amplification on the initial image to be expanded corresponding to the amplified image;

If yes, obtaining a balance target detection data set in a balance state according to the amplified images, which are larger than the expected balance threshold, in the preset number of initial images to be expanded.

On the other hand, the invention also provides a target detection data balancing device, which comprises:

the data acquisition module is used for acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

the data analysis module is used for analyzing the target detection data set and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on an initial image under the condition of spatial unbalance;

the image amplification module is used for carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain preset number of local amplification images, and training a preset background-category association model according to the preset number of local amplification images to obtain a target background-category association model;

and the image balancing module is used for carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state.

The beneficial effects of adopting the embodiment are as follows: according to the target detection data balancing method, the background-type association model is set, the original labeling information of the initial image is used for carrying out data local amplification on the initial image, so that a local amplification image is obtained, and further the background-type association model can be trained through the local amplification image, so that the situation of overfitting or insufficient fitting of the background-type association model during regional embedding can be avoided. Further, by analyzing the target detection data set, the class expansion ratio, the scale proportion rule and the target space distribution index corresponding to each initial image are set, so that unbalanced initial images can be processed according to the target space distribution index, and data global expansion can be carried out on the unbalanced initial images according to the class expansion ratio and the scale proportion rule to obtain initial images in space balance, and further the balance target detection data set is obtained, so that when the images are detected according to the balance target detection data set, the detection effect is improved, and the technical problem of data unbalance of target detection data is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings needed in the description of the embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment of a method for balancing object detection data according to the present invention;

FIG. 2 is a schematic diagram of a data local amplification structure according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the step S104 of FIG. 1 according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an embodiment of comparison of scale distribution adjustment provided by the present invention;

FIG. 5 is a schematic structural diagram of an embodiment of a target detection data balancing apparatus according to the present invention;

fig. 6 is a schematic structural diagram of an embodiment of an electronic device according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor systems and/or microcontroller systems.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The embodiment of the invention provides a target detection data balancing method and device, which are respectively described below.

Fig. 1 is a flow chart of an embodiment of a target detection data balancing method according to the present invention, where, as shown in fig. 1, the target detection data balancing method includes:

s101, acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

S102, analyzing a target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on the initial image under the condition of spatial unbalance;

s103, carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain preset number of local amplified images, and training a preset background-category association model according to the preset number of local amplified images to obtain a target background-category association model;

and S104, carrying out data global augmentation on the target detection data set according to the target background-category association model, the category augmentation ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state.

Compared with the prior art, the target detection data balancing method provided by the embodiment of the invention has the advantages that the background-type association model is set, the original labeling information of the initial image is used for carrying out data local amplification on the initial image, so that the local amplification image is obtained, and further, the background-type association model can be trained through the local amplification image, so that the situation of overfitting or insufficient fitting of the background-type association model can be avoided when the region is embedded. Further, by analyzing the target detection data set, the class expansion ratio, the scale proportion rule and the target space distribution index corresponding to each initial image are set, so that unbalanced initial images can be processed according to the target space distribution index, and data global expansion can be carried out on the unbalanced initial images according to the class expansion ratio and the scale proportion rule to obtain initial images in space balance, and further the balance target detection data set is obtained, so that when the images are detected according to the balance target detection data set, the detection effect is improved, and the technical problem of data unbalance of target detection data is solved.

In a specific embodiment of the present invention, the target detection data set may be Voc2007+12, and may be divided into two parts, i.e., a training set 15412 images and a testing set 1713 images, where the target detection data sets used in the subsequent steps are training sets. The target detection data set may include an image and original labeling information corresponding to the image, where the original labeling information is information obtained by labeling the image.

Aiming at the condition that the visual context backgrounds of the same type of targets in a single initial image of a target detection data set are similar, a fine-grained background embedding scheme is provided, and a data local amplification method (Data Local Augmentation, DLA) is designed so that similar backgrounds in a single image belong to the same visual context environment. And reconstructing a region, which is similar to the background of the marked target, of the copy in a single image according to the original marking information for the target detection data set, wherein the spatial reconstruction of the original target comprises basic image processing modes such as turning, rotation, scaling and the like.

Aiming at the condition that visual context information of the same-category targets among a plurality of initial images of a target detection data set is not uniform, a coarse-granularity alignment target enhancement scheme is provided, a Background-category correlation model (BCP) is designed, the target category correlated with the Background-to-Category Predictor is deduced from the input Background image, and the accuracy and the understandability of visual scene analysis are effectively improved. Background-class association model (BCP) extracts networks featuring res net-18, and predicts the probability of each class existing in the background using the full connection layer (link).

To train the BCP model, a Background-to-Category Dataset (BCD) is constructed, which uses a data local amplification method (DLA) to crop the Background-like region and mark the region as a corresponding target Category for construction, and simultaneously, various image enhancement schemes such as rotation, flipping, etc. are applied to improve the quality of the Background-to-target image Dataset (BCD).

Designing a data global amplification method (Data Global Augmentation, DGA), wherein the method firstly extracts random background images in an initial image according to a set scale size; secondly, acquiring category association corresponding to the background by using a background-category association model (BCP); and finally, selecting target images from all target examples according to the corresponding backgrounds and categories, and copying the target images into the backgrounds.

In some embodiments of the present invention, step S102 includes:

And determining a filling mask diagram according to the original labeling information corresponding to each initial image, and analyzing the filling mask diagram to obtain a target space distribution index corresponding to each initial image.

In a specific embodiment of the present invention, after the target detection dataset is comprehensively analyzed, a category distribution situation, a scale distribution situation, and a spatial distribution situation in the target detection dataset may be obtained, so that a difference between the number of instances of each category in the target detection dataset may be abstracted to be an expansion probability by obtaining the category distribution situation of the target detection dataset, thereby obtaining a category expansion ratio, and further designing the category expansion ratio, where the Category Expansion Ratio (CER) is shown in formula (1):

（1）

in the method, in the process of the invention,numfor each corresponding number of instances in the target detection dataset,iis the firstiThe categories.

The scale distribution condition is obtained by counting all original labeling information of all initial images in the target detection data set according to a scale classification standard, further a scale proportion rule can be formulated according to the scale distribution condition, a specific scale classification standard can be set according to actual conditions, the embodiment of the invention is not limited, and the scale proportion is # scale) As shown in formula (2):

（2）

in the method, in the process of the invention,w、hthe width and height of the original image in the original annotation information are respectively.

Scale rule%SRR) As shown in formula (3):

（3）

to evaluate the degree of imbalance of the spatial distribution of the object in the initial image and the effect of spatial balance in the DGA (Data GlobalAugmentation, global enhancement method) scheme, the Mask map is populated according to the object instance existence principle _i The target spatial distribution index (TSD) is designed, and the closer to 0, the more serious the image spatial distribution unbalance is proved, the index is in the range of 0-1. Wherein a fill mask map for each initial image in the target detection dataset can be generated from the original annotation informationMask _i 。

In some embodiments of the present invention, analyzing the filling mask map to obtain a target spatial distribution index corresponding to each initial image includes:

dividing a filling mask image of each initial image to obtain grid areas formed by a preset number of grids;

In a specific embodiment of the present invention, a Mask map is first filled _i Dividing the specified number of gridsGrid _xy ) Obtaining a grid area composed of a preset number of grids, and then filling a Mask map Mask by traversing _i In each grid of (a) grid regionGrid _xy And (5) calculating the target density. The number of grids may be controlled according to the target instance length-width average duty cycle of the target data set, and specific embodiments of the present invention are not limited herein. Each grid ofGrid _xy The calculation of the internal calculation target density is as shown in formula (4):

（4）

the target density is measured by counting the number of elements with a value of 1 in the grid if this number exceeds a set density threshold valuethreshold) The grid area is considered to be a dense area, and a specific concentration threshold value is calculatedthreshold) The configuration may be performed according to actual situations, and the embodiment of the present invention is not limited herein. The existence of the dense region is shown in formula (5):

（5）

after traversing all gridsGrid _xy Thereafter, the number of the entire dense areas is counted and based on each gridGrid _xy Is used to calculate the overall spatial distributionAnd (3) the situation. This procedure is used to evaluate the distribution of objects in the image to determine the degree of spatial imbalance.

The target spatial distribution index (TSD) is calculated as shown in equation (6):

（6）

in the method, in the process of the invention,m、nthe number of grid rows and columns, respectively.

In some embodiments of the present invention, step S103 includes:

performing instance screening on all instances in the filling mask graph of each initial image to obtain non-overlapped target instances to be amplified, and acquiring the target instances to be amplified, which contain background information, in each initial image to obtain a labeling information set corresponding to each initial image; the annotation information set comprises all target instances to be augmented and corresponding original annotation information;

carrying out space gesture transformation on all target instances to be amplified in the annotation information set, correspondingly adjusting original annotation information corresponding to each target instance to be amplified in the annotation information set, and obtaining a transformation annotation information set corresponding to each initial image;

and filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplification image corresponding to each initial image.

In a specific embodiment of the present invention, all instances in the filling mask map of each initial image are subjected to instance screening to obtain non-overlapping target instances to be amplified, and the target instances to be amplified and the labeling information set containing the background information in each initial image are sequentially acquired as O _i {T,B}. All target instances to be augmented can be included in the annotation information setT _i Corresponding original labeling informationB _i For a pair ofO _i All the target examples to be amplified in the set are subjected to space attitude transformation (turnover and scaling) and the corresponding original labeling information in the labeling information set is adjusted to obtainThe transformation annotation information set corresponding to each initial image is as followsTo achieve the purposes of amplifying the number of effective targets and enriching optional backgrounds.

In some embodiments of the present invention, as shown in fig. 2, filling an initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image, including:

s201, determining a first object instance to be amplified in all object instances to be amplified according to a transformation annotation information set of each initial image, and determining a preset number of placement positions in the corresponding initial image according to the size of the first object instance to be amplified; the placement position is positioned at the blank background;

s202, calculating a first target instance to be amplified and each placement position respectively to obtain the similarity corresponding to each placement position;

s203, determining the maximum similarity according to all the similarities, and filling the placement position corresponding to the maximum similarity according to the first target instance to be amplified and the adjusted original labeling information to obtain a new target instance;

S204, updating original annotation information in the transformation annotation information set according to the new target instance, and performing mask filling on the same region of the filling mask map to obtain a new initial image corresponding to each initial image;

s205, obtaining a local amplification image corresponding to each initial image according to the new initial image.

In the embodiment of the invention, the process of obtaining the local amplified image corresponding to each initial image is the same, taking the process of obtaining the local amplified image of one initial image as an example, the transformation annotation information set of the initial image can be traversedFrom the transformation annotation information set->Obtaining a corresponding first target instance to be augmentedT _j AndB _j according to the first object instance to be amplified in the initial imageT _j Is to find a suitable placement location, which may be multiple. According to the first to-be-amplified target instanceT _j Is used for traversing the initial image, and is used for skipping filling mask patternsMask _i The region where the filled target appears in the corresponding position only remains the blank backgroundC _k I.e. the preset number of placement positions, ensures that the augmented target and the original target are not overlapped.

To maintain the first target instance to be amplified to the maximum extentT _j Similarity with fine-grained background information of the placement position is carried out through a similarity calculation algorithm T _j AndC _k is used for the evaluation of the degree of chimerism of (a). First extractT _j AndC _k a certain proportion of background images at the peripheryI _T ,I _C ) And background histogram informationH _T ,H _C ) The algorithm integrates pixel differences, histogram differences, and structural changes of the background surrounding the target instance. The pixel gap is shown in formula (7):

（7）

the histogram difference formula is:

（8）

the structural variation formula is:

（9）

in the method, in the process of the invention,representing the luminance mean value of the initial image, +.>Represents the standard deviation of brightness of the initial image, +.>Representing the luminance covariance between the initial images. />Is a constant for avoiding the case where the denominator is zero, and is usually set to a very small positive number.

The similarity calculation formula is as follows:

（10）

in the method, in the process of the invention,for the scoring weight, 0.7,0.2,0.1 may be set, respectively.

All the similarity of all the background images in the initial image can be sequenced, the background image with the maximum similarity is determined, the background image with the maximum similarity is selected, and the designated placement positions corresponding to the background images in the initial image are according to the following stepsCopying and filling information to obtain a new target instance, updating original labeling information in an initial image of a transformation labeling information set, and filling a mask imageMask _i And (5) filling the mask in the same region to obtain a new initial image corresponding to each initial image.

In some embodiments of the present invention, obtaining a locally amplified image corresponding to each initial image according to the new initial image includes:

if yes, replacing the new initial image with the initial image, determining a second target instance to be amplified from the transformation annotation information set, and carrying out data local amplification on the initial image according to the second target instance to be amplified;

In a specific embodiment of the present invention, it may be determined whether a transformation target instance to be amplified exists in the transformation annotation information set, if so, the new initial image may be replaced with the initial image, then a second target instance to be amplified is determined from the transformation annotation information set, and according to the second target instance to be amplified, data local amplification is performed on the initial image, and the process is consistent with the steps of the first target instance to be amplified, thereby implementing the loop of steps S201 to S205, when the transformation annotation information set does not exist the transformation target instance to be amplified, the loop is indicated to be ended, the new initial image is determined to be a local amplified image, and when all the initial images in the transformation annotation information set are subjected to data local amplification, the local amplified image corresponding to each initial image may be obtained.

In some embodiments of the present invention, step S103 includes:

obtaining a background-target image data set according to all the expanded background images of the initial images with the preset number, and dividing the background-target image data set to obtain a training set and a testing set;

training the preset background-category association model according to the training set and the testing set to obtain the target background-category association model.

In a specific embodiment of the present invention, a background-target image dataset (BCD) is built and a background-class association model (BCP) is designed to build the association of background and class from the blank background C obtained in the data local augmentation method (DLA).

In some embodiments of the present invention, obtaining an extended background image corresponding to each initial image according to a background image of a preset number of placement positions of each target instance to be amplified of each initial image, including:

dividing the background images of the preset number of placement positions of each target instance to be amplified of each initial image to obtain the preset number of background images corresponding to each target instance to be amplified;

Storing a preset number of background images according to category information in the adjusted original labeling information of each target instance to be amplified to obtain a category background image corresponding to each target instance to be amplified;

and according to the image quantity of each category in the background images of all categories of all the object instances to be amplified, carrying out overturn expansion on the background images of all categories to obtain an expansion background image corresponding to each initial image.

In a specific embodiment of the present invention, background images (blank backgrounds) of a preset number of placement positions of each target instance to be amplified of each initial image are segmented to obtain a preset number of background images corresponding to each target instance to be amplified, and according to the first target instance to be amplifiedB _j The category information in the method comprises the steps of storing a preset number of background images according to "{ class } { index }. Jpg", and obtaining category background images corresponding to each first target instance to be amplified; because the number of each category in the background images of all categories of all the object instances to be amplified is not equal, the method can adopt expansion operations such as turning and rotation to balance, and adopts data enhancement means with different grades for different image categories, wherein the specific operation contents comprise turning [ horizontal, vertical, horizontal and vertical ] ]（flip[1,0,-1]) Rotating [ 90 ° right turn, 90 ° left turn, 180 °]（angle[90，-90，180]) Thus obtaining the corresponding expanded background image after each initial image expansion.

Further, all the expanded background images of the preset number of initial images may be saved in the data set to obtain a background-target image data set (BCD), for example, the background-target image data set (BCD) may be an amplified data set, may include 14903 images, and 20 categories, and the background-target image data set (BCD) may be divided into a training set and a test set according to a ratio of 8:2, and a specific division ratio may be set according to an actual situation.

Further, a feature extraction network with ResNet-18 as a background-class association model (BCP) can be used to conduct class prediction on the extracted features using a fully connected layer and to use Softmax to return the prediction result to a probability of 0-1. Model training uses Cross-Entropy Loss (Cross-Entropy Loss) to calculate the gap between model output and real labels, training can be performed on the model through a training set and a testing set, and the model is adjusted through random gradient descent (SGD) to obtain a target background-category correlation model.

In some embodiments of the present invention, as shown in fig. 3, step S104 includes:

s301, obtaining a preset number of target examples suitable for copy expansion according to original labeling information of each initial image in a target detection data set;

s302, controlling the category and scale proportion of a preset number of target examples according to the category expansion ratio and the scale proportion rule to obtain a preset number of control target examples, and determining an initial image with a target space distribution index smaller than a preset threshold value as an initial image to be expanded, so as to obtain a preset number of initial images to be expanded;

s303, obtaining a background area set which is not overlapped with a preset number of control target examples in each initial image to be expanded according to the missing scale proportion of each initial image to be expanded;

s304, determining a preset number of regions to be expanded in a background region set according to the target space distribution index;

s305, carrying out data global expansion on a preset number of areas to be expanded according to a target background-category association model to obtain an expanded image after each initial image to be expanded is expanded;

s306, judging whether the target spatial distribution index of the amplified image is larger than an expected balance threshold value;

s307, if not, continuing to perform data global amplification on the initial image to be expanded corresponding to the amplified image;

And S308, if so, obtaining a balance target detection data set in a balance state according to the preset number of amplified images which are larger than the expected balance threshold value in the initial images to be expanded.

In the specific embodiment of the invention, a large number of target examples suitable for copy expansion can be extracted by applying a non-overlapping principle and scale division according to the original labeling information of each initial image in the target detection data set, and an example pool can be formed, wherein the examples all contain 20% of original background information outside each example and are used for keeping invariance of fine-granularity background information during copying. Simultaneous generation of fill masksMask _i The method is used for calculating the target spatial distribution index. Wherein a fill mask is generatedMask _i The calculation process of the target spatial distribution index is identical to the process in step S102, and the description of the embodiment of the present invention is not repeated here.

The proportion of the class and the scale of the copy instance can be controlled according to the class expansion ratio and the scale proportion rule to obtain a preset number of control target instances, and the control target instances can be calculated by calculationMask _i The target spatial distribution index TSD of (c) is used to determine a preset number of initial images to be expanded, which need to be expanded, from among the preset number of initial images, an initial image with a target spatial distribution index TSD smaller than a preset threshold may be determined as the initial image to be expanded, and the specific preset threshold may be set according to the actual situation. If the initial image to be expanded does not exist, ending the flow, if the initial image to be expanded exists, determining the scale proportion of the deletion of each initial image to be expanded according to the original labeling information of the initial image, and further generating a background area set which is not overlapped with the existing target instance RegionIt is thus possible to obtain which regions need to be expanded according to the target spatial distribution index procedure, for example, regions with a target spatial distribution index less than 1 need to be expanded. Further, a preset number of regions to be expanded in the background region set can be determined, and data global expansion is performed on the preset number of regions to be expanded according to a target background-class association model (BCP), specifically, the background-class association model (BCP) can be used for obtaining class prediction information, and a high-score class exceeding a threshold value can be selected as the high-score classThe correlation class of the background ensures the correctness of the coarse-grained visual context.

The prediction process formula is:

（11）

where argmax () is a function of the maximum subscript.

And selecting the multi-scale target instance of the corresponding category in the instance pool for copying according to the association category. And randomly selecting one instance from the preset number of control target instances to zoom to the size of the background image, and keeping the original aspect ratio of the instance unchanged. Copying the adjusted instance into a background image to obtain a new target instance, adding annotation information into an initial image corresponding annotation file of a target detection data set, and simultaneously performing mask filling on the same region of the filling mask image to obtain an amplified image after amplification. The class ratio of the copies is also considered in the selection of instances. The data local amplification and the data global amplification are approximately the same, the data local amplification is to amplify the initial image according to the category and the background picture of each target instance to be amplified in each initial image, and the data global amplification is to process according to the category and the background picture of all the target instances in all the initial images and amplify.

Further, the target spatial distribution index of the amplified image may be calculated, the calculation process is consistent with the above, the embodiment of the present invention does not repeat, it is determined whether the target spatial distribution index of the amplified image is greater than the expected balance threshold, if not, the data global amplification is required to be performed again on the initial image corresponding to the amplified image, step S302 and the subsequent steps are performed again until the target spatial distribution index of the amplified image is greater than the expected balance threshold, it may be determined that the amplified image is in spatial balance, the processing processes of other initial images to be expanded are consistent, then the amplified image of each initial image to be expanded may be obtained, and further, a balanced target detection data set in a balanced state may be obtained, as shown in fig. 4, fig. 4 is a scale distribution adjustment contrast, and is data before scale balance, and the right graph is data after scale balance, and according to the left graph and the right graph, it may be seen that the distribution condition of the scale proportion in the target detection data set that is globally expanded by the target background-class association model is more reasonable, and according to the contrast condition may be seen that the spatial distribution of the target detection data set that is more evenly expanded by the target background-class association model.

According to the embodiment of the invention, the unit target instance is obtained through the original labeling information of each initial image in the target detection data set, and the effective target instance conforming to the visual context is added in the target detection data set through the target background-class association model (BCP), so that the global amplification of the target detection data set is realized, and the technical problem that the risk of damaging the context information of the data set exists in the target instance with semantic segmentation labeling by random copy and paste is solved. Further, by means of the category expansion ratio and the scale proportion rule, a background area set Region which is not overlapped with the existing target instance is generated, global amplification of the target detection data set is carried out according to the background area set Region, the problem that the background near the original target is focused in the prior art is solved, definition of the background existing in the non-target is too absolute, a large number of reasonable backgrounds meeting the condition are ignored, and meanwhile the unbalance problem is not solved.

In order to better implement the target detection data balancing method in the embodiment of the present invention, correspondingly, on the basis of the target detection data balancing method, the embodiment of the present invention further provides a target detection data balancing device, as shown in fig. 5, where the target detection data balancing device includes:

A data acquisition module 501 for acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

the data analysis module 502 is configured to analyze the target detection data set, and set a category expansion ratio, a scale proportion rule, and a target spatial distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on the initial image under the condition of spatial unbalance;

the image amplification module 503 is configured to perform data local amplification on a preset number of initial images according to the original labeling information to obtain a preset number of local amplified images, and train a preset background-class association model according to the preset number of local amplified images to obtain a target background-class association model;

the image balancing module 504 is configured to perform data global augmentation on the target detection data set according to the target background-category association model, the category augmentation ratio, the scale proportion rule, and the target spatial distribution index, so as to obtain a balanced target detection data set in a balanced state.

The target detection data balancing device provided in the foregoing embodiment may implement the technical solution described in the foregoing target detection data balancing method embodiment, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing target detection data balancing method embodiment, which is not described herein again.

As shown in fig. 6, the present invention further provides an electronic device 600 accordingly. The electronic device 600 comprises a processor 601, a memory 602 and a display 603. Fig. 6 shows only a portion of the components of the electronic device 600, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.

The memory 602 may be an internal storage unit of the electronic device 600 in some embodiments, such as a hard disk or memory of the electronic device 600. The memory 602 may also be an external storage device of the electronic device 600 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 600.

Further, the memory 602 may also include both internal storage units and external storage devices of the electronic device 600. The memory 602 is used for storing application software and various types of data for installing the electronic device 600.

The processor 601 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 602, such as the object detection data balancing method of the present invention.

The display 603 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like in some embodiments. The display 603 is used for displaying information at the electronic device 600 and for displaying a visual user interface. The components 601-603 of the electronic device 600 communicate with each other via a system bus.

In some embodiments of the present invention, when the processor 601 executes the object detection data balancing program in the memory 602, the following steps may be implemented:

analyzing the target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on the initial image under the condition of spatial unbalance;

It should be understood that: the processor 601 may perform other functions in addition to the above functions when executing the object detection data balancing program in the memory 602, see in particular the description of the corresponding method embodiments above.

Further, the type of the electronic device 600 is not particularly limited, and the electronic device 600 may be a mobile phone, a tablet computer, a personal digital assistant (personal digitalassistant, PDA), a wearable device, a laptop (laptop), or other portable electronic devices. Exemplary embodiments of portable electronic devices include, but are not limited to, portable electronic devices that carry IOS, android, microsoft or other operating systems. The portable electronic device described above may also be other portable electronic devices, such as a laptop computer (laptop) or the like having a touch-sensitive surface, e.g. a touch panel. It should also be appreciated that in other embodiments of the invention, the electronic device 600 may not be a portable electronic device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch panel).

Accordingly, the embodiments of the present application further provide a computer readable storage medium, where the computer readable storage medium is used to store a computer readable program or instructions, and when the program or instructions are executed by a processor, the steps or functions of the target detection data balancing method provided in the foregoing method embodiments can be implemented.

Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program stored in a computer readable storage medium to instruct related hardware (e.g., a processor, a controller, etc.). The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above detailed description of the method and apparatus for balancing target detection data provided by the present invention applies specific examples to illustrate the principles and embodiments of the present invention, and the above examples are only used to help understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims

1. A method of balancing target detection data, comprising:

2. The method for balancing target detection data according to claim 1, wherein analyzing the target detection data set, setting a class expansion ratio, a scale ratio rule, and a target spatial distribution index corresponding to each initial image, comprises:

3. The method for balancing target detection data according to claim 2, wherein the analyzing the filling mask map to obtain the target spatial distribution index corresponding to each initial image includes:

4. The method of claim 2, wherein the performing data local amplification on the preset number of initial images according to the original labeling information to obtain a preset number of local amplified images includes:

5. The method of claim 4, wherein the filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image comprises:

6. The method of claim 5, wherein obtaining the local amplification image corresponding to each initial image according to the new initial image comprises:

7. The method of claim 4, wherein training a preset background-class association model according to the preset number of local amplification images to obtain a target background-class association model comprises:

8. The method for balancing object detection data according to claim 1, wherein the obtaining the extended background image corresponding to each initial image according to the background images of the preset number of placement positions of each object instance to be amplified in each initial image includes:

9. The method of claim 1, wherein the performing data global augmentation on the target detection dataset according to the target background-class association model, the class expansion ratio, the scale ratio rule, and the target spatial distribution index to obtain a balanced target detection dataset in a balanced state comprises:

10. An object detection data balancing apparatus, comprising: