CN117523345B

CN117523345B - Target detection data balancing method and device

Info

Publication number: CN117523345B
Application number: CN202410024623.6A
Authority: CN
Inventors: 罗芳; 马佳星; 周莹静; 颜昆; 何芷馨; 罗妍婕
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-04-23
Anticipated expiration: 2044-01-08
Also published as: CN117523345A

Abstract

The invention provides a target detection data balancing method and device, comprising the following steps: analyzing the target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index of each initial image; carrying out data local amplification on the initial image according to the original labeling information to obtain local amplified images, and training a preset background-category association model according to the preset number of the local amplified images to obtain a target background-category association model; and carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state. The invention sets the background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain the balance target detection data set in the balance state, and solves the technical problem of data unbalance of the target detection data.

Description

Target detection data balancing method and device

Technical Field

The invention relates to the technical field of image data detection, in particular to a target detection data balancing method and device.

Background

Object detection is one of the core tasks in the field of computer vision, which aims to locate and classify objects of interest in images. The localization and classification of the object is focused in part on the characteristic information of the object itself and in part on the context information of the environment, which can help the model fit quickly. While in many target detection data sets, data imbalance is a core problem that can be categorized into category imbalance, scale imbalance and spatial imbalance by the cause of the imbalance.

Class imbalance, existing target detection datasets, due to differences in acquisition environments and manual labeling, generally suffer from an uneven distribution of the number of samples in different classes, which may lead to poor performance of the model over a few classes, as it does not have enough samples to learn the characteristics of these classes.

The scale imbalance is a large difference in size of the target objects in the data set. Such non-uniform scale distribution may cause problems for the object detection model when handling objects of different sizes. The model may more easily detect large objects while ignoring small objects, or may present false or missed detection when dealing with small objects.

Spatial unbalance, uneven distribution of target objects in different areas in the target detection dataset. In particular, some regions may contain more targets, while other regions may have little or no targets present. Such imbalance may result in the target detection model being over-fitted to certain regions and under-fitted to other regions during training.

Therefore, there is an urgent need to provide a method and apparatus for balancing target detection data, which solve the problems of class imbalance, scale imbalance and spatial imbalance of data in the target detection data set in the prior art, so as to cause a technical problem of poor effect when detecting an image.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a method and a device for balancing target detection data, which are used for solving the technical problems of poor effect when detecting images caused by the problems of unbalanced category, unbalanced scale and unbalanced space of data in the target detection data set in the prior art.

In one aspect, the present invention provides a target detection data balancing method, including:

acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

Analyzing the target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on an initial image under the condition of spatial unbalance;

Carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain preset number of local amplified images, and training a preset background-category association model according to the preset number of local amplified images to obtain a target background-category association model;

And carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state.

In some possible implementations, the analyzing the target detection data set, setting a class expansion ratio, a scale ratio rule, and a target spatial distribution index corresponding to each initial image, includes:

Analyzing the target detection data set to obtain class distribution conditions of all the examples, determining expansion probability according to differences among the number of examples of each class in the class distribution conditions, and further determining class expansion ratio according to the expansion probability;

performing scale analysis on all original marked information in the target detection data set to obtain scale distribution conditions, and determining scale proportion rules according to the scale distribution conditions;

and determining a filling mask map according to the original labeling information corresponding to each initial image, and analyzing the filling mask map to obtain a target space distribution index corresponding to each initial image.

In some possible implementations, the analyzing the filling mask map to obtain the target spatial distribution index corresponding to each initial image includes:

Dividing the filling mask image of each initial image to obtain grid areas consisting of a preset number of grids;

Performing concentration calculation on each grid in the grid area to obtain target concentration;

And determining the region type corresponding to each grid according to the target density, and determining the target spatial distribution index of the initial image according to all the region types of the preset number of grids in the grid region.

In some possible implementations, the performing data local amplification on the preset number of initial images according to the original labeling information to obtain a preset number of local amplified images includes:

performing instance screening on all instances in the filling mask graph of each initial image to obtain non-overlapped target instances to be amplified, and acquiring the target instances to be amplified, which contain background information, in each initial image to obtain a labeling information set corresponding to each initial image; the annotation information set comprises all target instances to be augmented and the corresponding original annotation information;

performing spatial gesture transformation on all the target instances to be amplified in the annotation information set, correspondingly adjusting original annotation information corresponding to each target instance to be amplified in the annotation information set, and obtaining a transformation annotation information set corresponding to each initial image;

And filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image.

In some possible implementations, the filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image includes:

Determining a first target instance to be amplified in all target instances to be amplified according to the transformation annotation information set of each initial image, and determining a preset number of placement positions in the corresponding initial image according to the size of the first target instance to be amplified; the placement position is positioned at the blank background;

Calculating the first target instance to be amplified and each placement position respectively to obtain the similarity corresponding to each placement position;

determining the maximum similarity according to all the similarities, and filling the placement position corresponding to the maximum similarity according to the first target instance to be amplified and the adjusted original labeling information to obtain a new target instance;

Updating the original annotation information in the transformation annotation information set according to the new target instance, and performing mask filling on the same region of the filling mask map to obtain a new initial image corresponding to each initial image;

and obtaining a local amplification image corresponding to each initial image according to the new initial image.

In some possible implementations, the obtaining the local amplification image corresponding to each initial image according to the new initial image includes:

Judging whether a transformation target instance to be amplified exists in the transformation labeling information set or not;

if yes, replacing the new initial image with an initial image, determining a second target instance to be amplified from the transformation annotation information set, and carrying out data local amplification on the initial image according to the second target instance to be amplified;

If not, determining the new initial image as a local amplification image, and obtaining the local amplification image corresponding to each initial image when the data local amplification of each initial image is completed.

In some possible implementations, training a preset background-class association model according to the preset number of local augmentation images to obtain a target background-class association model includes:

obtaining an extended background image corresponding to each initial image according to the background images of the preset number of placement positions of each target instance to be amplified of each initial image;

Obtaining a background-target image data set according to all the expanded background images of the preset number of initial images, and dividing the background-target image data set to obtain a training set and a testing set;

and training the preset background-category association model according to the training set and the testing set to obtain a target background-category association model.

In some possible implementations, the obtaining the extended background image corresponding to each initial image according to the background images of the preset number of placement positions of each target instance to be amplified of each initial image includes:

Dividing the background images of the preset number of placement positions of each target instance to be amplified of each initial image to obtain a preset number of background images corresponding to each target instance to be amplified;

storing the preset number of background images according to the category information in the adjusted original labeling information of each target instance to be amplified to obtain a category background image corresponding to each target instance to be amplified;

And according to the image quantity of each category in the background images of all categories of all the target instances to be amplified, carrying out overturn expansion on the background images of all categories to obtain expansion background images corresponding to each initial image.

In some possible implementations, the performing data global augmentation on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule, and the target spatial distribution index to obtain a balanced target detection data set in a balanced state includes:

Obtaining a preset number of target examples suitable for copy expansion according to the original labeling information of each initial image in the target detection data set;

Controlling the category and scale proportion of the preset number of target examples according to the category expansion ratio and the scale proportion rule to obtain the preset number of control target examples, and determining the initial images with the target space distribution indexes smaller than a preset threshold value as initial images to be expanded, so as to obtain the preset number of initial images to be expanded;

Obtaining a background area set which is not overlapped with the preset number of control target examples in each initial image to be expanded according to the scale proportion of the missing of each initial image to be expanded;

Determining a preset number of regions to be expanded in the background region set according to the target space distribution index;

Performing data global amplification on the preset number of areas to be amplified according to the target background-category association model to obtain amplified images after the amplification of each initial image to be amplified;

judging whether the target spatial distribution index of the amplified image is larger than an expected balance threshold value or not;

If not, continuing to perform data global amplification on the initial image to be expanded corresponding to the amplified image;

if yes, obtaining a balance target detection data set in a balance state according to the amplified images, which are larger than the expected balance threshold, in the preset number of initial images to be expanded.

On the other hand, the invention also provides a target detection data balancing device, which comprises:

The data acquisition module is used for acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

the data analysis module is used for analyzing the target detection data set and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on an initial image under the condition of spatial unbalance;

The image amplification module is used for carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain preset number of local amplification images, and training a preset background-category association model according to the preset number of local amplification images to obtain a target background-category association model;

And the image balancing module is used for carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state.

The beneficial effects of adopting the embodiment are as follows: according to the target detection data balancing method, the background-type association model is set, the original labeling information of the initial image is used for carrying out data local amplification on the initial image, so that a local amplification image is obtained, and further the background-type association model can be trained through the local amplification image, so that the situation of overfitting or insufficient fitting of the background-type association model during regional embedding can be avoided. Further, by analyzing the target detection data set, the class expansion ratio, the scale proportion rule and the target space distribution index corresponding to each initial image are set, so that unbalanced initial images can be processed according to the target space distribution index, and data global expansion can be carried out on the unbalanced initial images according to the class expansion ratio and the scale proportion rule to obtain initial images in space balance, and further the balance target detection data set is obtained, so that when the images are detected according to the balance target detection data set, the detection effect is improved, and the technical problem of data unbalance of target detection data is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings needed in the description of the embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment of a method for balancing object detection data according to the present invention;

FIG. 2 is a schematic diagram of a data local amplification structure according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the step S104 of FIG. 1 according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an embodiment of comparison of scale distribution adjustment provided by the present invention;

FIG. 5 is a schematic structural diagram of an embodiment of a target detection data balancing apparatus according to the present invention;

fig. 6 is a schematic structural diagram of an embodiment of an electronic device according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor systems and/or microcontroller systems.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The embodiment of the invention provides a target detection data balancing method and device, which are respectively described below.

Fig. 1 is a flow chart of an embodiment of a target detection data balancing method according to the present invention, where, as shown in fig. 1, the target detection data balancing method includes:

S101, acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

S102, analyzing a target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on the initial image under the condition of spatial unbalance;

S103, carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain preset number of local amplified images, and training a preset background-category association model according to the preset number of local amplified images to obtain a target background-category association model;

And S104, carrying out data global augmentation on the target detection data set according to the target background-category association model, the category augmentation ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state.

Compared with the prior art, the target detection data balancing method provided by the embodiment of the invention has the advantages that the background-type association model is set, the original labeling information of the initial image is used for carrying out data local amplification on the initial image, so that the local amplification image is obtained, and further, the background-type association model can be trained through the local amplification image, so that the situation of overfitting or insufficient fitting of the background-type association model can be avoided when the region is embedded. Further, by analyzing the target detection data set, the class expansion ratio, the scale proportion rule and the target space distribution index corresponding to each initial image are set, so that unbalanced initial images can be processed according to the target space distribution index, and data global expansion can be carried out on the unbalanced initial images according to the class expansion ratio and the scale proportion rule to obtain initial images in space balance, and further the balance target detection data set is obtained, so that when the images are detected according to the balance target detection data set, the detection effect is improved, and the technical problem of data unbalance of target detection data is solved.

In a specific embodiment of the present invention, the target detection data set may be Voc2007+12, and may be divided into two parts, i.e., a training set 15412 images and a testing set 1713 images, where the target detection data sets used in the subsequent steps are training sets. The target detection data set may include an image and original labeling information corresponding to the image, where the original labeling information is information obtained by labeling the image.

Aiming at the condition that the visual context backgrounds of the same type of targets in a single initial image of a target detection dataset are similar, a fine-grained background embedding scheme is provided, and a data local amplification method (Data Local Augmentation, DLA) is designed, so that similar backgrounds in a single image belong to the same visual context environment. And reconstructing a region, which is similar to the background of the marked target, of the copy in a single image according to the original marking information for the target detection data set, wherein the spatial reconstruction of the original target comprises basic image processing modes such as turning, rotation, scaling and the like.

Aiming at the condition that visual context information of the same-category targets among a plurality of initial images of a target detection data set is not uniform, a coarse-granularity alignment target enhancement scheme is provided, a Background-category correlation model (BCP) is designed, the target category correlated with the Background-to-Category Predictor is deduced from the input Background image, and the accuracy and the understandability of visual scene analysis are effectively improved. The background-class association model (BCP) characterizes ResNet-18 the extraction network and predicts the probability of each class existing in the background using the full connection layer (link).

To train the BCP model, a Background-to-Category Dataset (BCD) image dataset is constructed, the BCD dataset uses a data local amplification method (DLA) to cut out the Background similar region and mark the Background similar region as the corresponding target class for construction, and simultaneously, various image enhancement schemes such as rotation, overturn and the like are applied to improve the quality of the Background-target image dataset (BCD).

Designing a data global amplification method (Data Global Augmentation, DGA), wherein the method firstly extracts a random background image from an initial image according to a set scale size; secondly, acquiring category association corresponding to the background by using a background-category association model (BCP); and finally, selecting target images from all target examples according to the corresponding backgrounds and categories, and copying the target images into the backgrounds.

In some embodiments of the present invention, step S102 includes:

And determining a filling mask diagram according to the original labeling information corresponding to each initial image, and analyzing the filling mask diagram to obtain a target space distribution index corresponding to each initial image.

In a specific embodiment of the present invention, after the target detection dataset is comprehensively analyzed, a category distribution situation, a scale distribution situation, and a spatial distribution situation in the target detection dataset may be obtained, so that a difference between the number of instances of each category in the target detection dataset may be abstracted to be an expansion probability by obtaining the category distribution situation of the target detection dataset, thereby obtaining a category expansion ratio, and further designing the category expansion ratio, where the Category Expansion Ratio (CER) is shown in formula (1):

（1）

In the formula, num is the number of examples corresponding to each class in the target detection data set, and i is the ith class.

The scale distribution condition is obtained by counting all original labeling information of all initial images in the target detection data set according to a scale classification standard, further a scale proportion rule can be formulated according to the scale distribution condition, a specific scale classification standard can be set according to actual conditions, the embodiment of the invention is not limited herein, and the scale proportion (scale) is shown in a formula (2):

（2）

Wherein w and h are the width and height of the initial image in the original labeling information respectively.

The Scale Ratio Rule (SRR) is as shown in formula (3):

（3）

to evaluate the degree of imbalance of the target spatial distribution in the initial image and the effect of spatial balance in the DGA (Data GlobalAugmentation, global enhancement method) scheme, a target spatial distribution index (TSD) is designed according to the target instance existence principle and the fill Mask map Mask _i, which is in the range of 0-1, and the closer to 0, the more serious the image spatial distribution imbalance is proved. The Mask map _i for filling each initial image in the target detection dataset may be generated according to the original annotation information.

In some embodiments of the present invention, analyzing the filling mask map to obtain a target spatial distribution index corresponding to each initial image includes:

Dividing a filling mask image of each initial image to obtain grid areas formed by a preset number of grids;

In a specific embodiment of the present invention, the filling Mask map _i is first divided into a specified number of grids (Grid _xy) to obtain a Grid area composed of a preset number of grids, and then the target density is calculated in each Grid _xy by traversing the Grid area of the filling Mask map _i. The number of grids may be controlled according to the target instance length-width average duty cycle of the target data set, and specific embodiments of the present invention are not limited herein. The calculation of the target density in each Grid _xy is shown in formula (4):

（4）

The target density is measured by counting the number of element values of 1 in the grid, and if the number exceeds a set density threshold (threshold), the grid area is considered to be a dense area, and the specific density threshold (threshold) can be set according to the actual situation, and the embodiment of the present invention is not limited herein. The existence of the dense region is shown in formula (5):

（5）

After traversing all Grid _xy, the number of all dense areas is counted, and the overall spatial distribution is calculated according to the size and number of each Grid _xy. This procedure is used to evaluate the distribution of objects in the image to determine the degree of spatial imbalance.

The target spatial distribution index (TSD) is calculated as shown in equation (6):

（6）

where m and n are the number of grid rows and columns, respectively.

In some embodiments of the present invention, step S103 includes:

Performing instance screening on all instances in the filling mask graph of each initial image to obtain non-overlapped target instances to be amplified, and acquiring the target instances to be amplified, which contain background information, in each initial image to obtain a labeling information set corresponding to each initial image; the annotation information set comprises all target instances to be augmented and corresponding original annotation information;

Carrying out space gesture transformation on all target instances to be amplified in the annotation information set, correspondingly adjusting original annotation information corresponding to each target instance to be amplified in the annotation information set, and obtaining a transformation annotation information set corresponding to each initial image;

and filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplification image corresponding to each initial image.

In a specific embodiment of the present invention, an instance screening is performed on all instances in a filling mask map of each initial image to obtain non-overlapping target instances to be amplified, and a set of target instances to be amplified and labeling information containing background information in each initial image is sequentially obtained, where the set is O _i { T, B }. The labeling information set can comprise all target instances T _i to be amplified and corresponding original labeling information B _i, and the spatial gesture transformation (turnover and scaling) is carried out on all target instances to be amplified in O _i, and the corresponding original labeling information in the labeling information set is adjusted, so that the transformation labeling information set corresponding to each initial image is obtained as followsTo achieve the purposes of amplifying the number of effective targets and enriching optional backgrounds.

In some embodiments of the present invention, as shown in fig. 2, filling an initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image, including:

S201, determining a first object instance to be amplified in all object instances to be amplified according to a transformation annotation information set of each initial image, and determining a preset number of placement positions in the corresponding initial image according to the size of the first object instance to be amplified; the placement position is positioned at the blank background;

s202, calculating a first target instance to be amplified and each placement position respectively to obtain the similarity corresponding to each placement position;

s203, determining the maximum similarity according to all the similarities, and filling the placement position corresponding to the maximum similarity according to the first target instance to be amplified and the adjusted original labeling information to obtain a new target instance;

S204, updating original annotation information in the transformation annotation information set according to the new target instance, and performing mask filling on the same region of the filling mask map to obtain a new initial image corresponding to each initial image;

S205, obtaining a local amplification image corresponding to each initial image according to the new initial image.

In the embodiment of the invention, the process of obtaining the local amplified image corresponding to each initial image is the same, taking the process of obtaining the local amplified image of one initial image as an example, the transformation annotation information set of the initial image can be traversedFrom transformation annotation information set/>T _j and B _j corresponding to the first target instance to be augmented are obtained, and suitable placement positions are found in the initial image according to the size of the first target instance to be augmented T _j, and the placement positions may be multiple. Traversing the initial image according to the length and width of the first to-be-augmented target instance T _j, skipping over the region where the filled target appears in the corresponding position of the filling Mask map Mask _i, and only reserving blank backgrounds C _k, namely the preset number of placement positions, so as to ensure that the augmented target and the original target are not overlapped.

In order to keep the similarity of the first target example to be amplified T _j and the fine-grained background information of the placement position to the greatest extent, the chimeric degree evaluation of T _j and C _k is carried out through a similarity calculation algorithm. A proportion of the background image (I _T,I_C) and background histogram information (H _T,H_C) around the periphery of T _j and C _k are first extracted, and the algorithm takes into account the pixel differences, histogram differences and structural variations of the background around the target instance. The pixel gap is shown in formula (7):

（7）

The histogram difference formula is:

（8）

the structural variation formula is:

（9）

in the method, in the process of the invention, Representing the luminance mean value of the initial image,/>Representing the standard deviation of luminance of the initial image,/>Representing the luminance covariance between the initial images. /(I)Is a constant for avoiding the case where the denominator is zero, and is usually set to a very small positive number.

The similarity calculation formula is as follows:

（10）

in the method, in the process of the invention, For the scoring weights, 0.7,0.2,0.1 may be set, respectively.

All the similarity of all the background images in the initial image can be sequenced, the background image with the maximum similarity is determined, the background image with the maximum similarity is selected, and the designated placement positions corresponding to the background images in the initial image are according to the following stepsAnd copying and filling the information to obtain a new target instance, updating the original annotation information in the initial image of the transformation annotation information set, and filling the Mask in the same region of the filling Mask map _i to obtain a new initial image corresponding to each initial image.

In some embodiments of the present invention, obtaining a locally amplified image corresponding to each initial image according to the new initial image includes:

if yes, replacing the new initial image with the initial image, determining a second target instance to be amplified from the transformation annotation information set, and carrying out data local amplification on the initial image according to the second target instance to be amplified;

In a specific embodiment of the present invention, it may be determined whether a transformation target instance to be amplified exists in the transformation annotation information set, if so, the new initial image may be replaced with the initial image, then a second target instance to be amplified is determined from the transformation annotation information set, and according to the second target instance to be amplified, data local amplification is performed on the initial image, and the process is consistent with the steps of the first target instance to be amplified, thereby implementing the loop of steps S201 to S205, when the transformation annotation information set does not exist the transformation target instance to be amplified, the loop is indicated to be ended, the new initial image is determined to be a local amplified image, and when all the initial images in the transformation annotation information set are subjected to data local amplification, the local amplified image corresponding to each initial image may be obtained.

In some embodiments of the present invention, step S103 includes:

Obtaining a background-target image data set according to all the expanded background images of the initial images with the preset number, and dividing the background-target image data set to obtain a training set and a testing set;

training the preset background-category association model according to the training set and the testing set to obtain the target background-category association model.

In a specific embodiment of the present invention, a background-target image dataset (BCD) is built and a background-class association model (BCP) is designed to build the association of background and class from the blank background C obtained in the data local augmentation method (DLA).

In some embodiments of the present invention, obtaining an extended background image corresponding to each initial image according to a background image of a preset number of placement positions of each target instance to be amplified of each initial image, including:

dividing the background images of the preset number of placement positions of each target instance to be amplified of each initial image to obtain the preset number of background images corresponding to each target instance to be amplified;

storing a preset number of background images according to category information in the adjusted original labeling information of each target instance to be amplified to obtain a category background image corresponding to each target instance to be amplified;

and according to the image quantity of each category in the background images of all categories of all the object instances to be amplified, carrying out overturn expansion on the background images of all categories to obtain an expansion background image corresponding to each initial image.

In a specific embodiment of the present invention, background images (blank backgrounds) of a preset number of placement positions of each target instance to be amplified of each initial image are segmented to obtain a preset number of background images corresponding to each target instance to be amplified, and the preset number of background images are stored according to class information in B _j of the first target instance to be amplified according to "{ class } { index } ] jpg", so as to obtain class background images corresponding to each first target instance to be amplified; because the number of each category in the background images of all categories of all the object instances to be augmented is not equal, the expansion operations such as turning and rotation can be adopted to balance, and the data enhancement means with different grades are adopted for the image categories with different numbers, wherein the specific operation contents comprise turning [ horizontal, vertical, horizontal and vertical ] (flip [1,0, -1 ]), rotating [90 degrees to the right, 90 degrees to the left, 180 degrees ] (angle [90, -90, 180 ]), so as to obtain the corresponding expansion background image after each initial image expansion.

Further, all the expanded background images of the preset number of initial images may be saved in the data set to obtain a background-target image dataset (BCD), for example, the background-target image dataset (BCD) may be an amplified data set, may include 14903 images, 20 categories, and the background-target image dataset (BCD) may be divided into a training set and a testing set according to a ratio of 8:2, and a specific division ratio may be set according to an actual situation.

Further, resNet-18 can be used as a feature extraction network of a background-class association model (BCP), a fully connected layer is used to conduct class prediction on the extracted features, and Softmax is used to return the prediction result to the probability of 0-1. Model training uses Cross entropy loss (Cross-Entropy Loss) to calculate the difference between the model output and the real label, the model can be trained through a training set and a testing set, and the model is adjusted by using random gradient descent (SGD), so that a target background-category correlation model is obtained.

In some embodiments of the present invention, as shown in fig. 3, step S104 includes:

S301, obtaining a preset number of target examples suitable for copy expansion according to original labeling information of each initial image in a target detection data set;

S302, controlling the category and scale proportion of a preset number of target examples according to the category expansion ratio and the scale proportion rule to obtain a preset number of control target examples, and determining an initial image with a target space distribution index smaller than a preset threshold value as an initial image to be expanded, so as to obtain a preset number of initial images to be expanded;

S303, obtaining a background area set which is not overlapped with a preset number of control target examples in each initial image to be expanded according to the missing scale proportion of each initial image to be expanded;

S304, determining a preset number of regions to be expanded in a background region set according to the target space distribution index;

S305, carrying out data global expansion on a preset number of areas to be expanded according to a target background-category association model to obtain an expanded image after each initial image to be expanded is expanded;

s306, judging whether the target spatial distribution index of the amplified image is larger than an expected balance threshold value;

S307, if not, continuing to perform data global amplification on the initial image to be expanded corresponding to the amplified image;

and S308, if so, obtaining a balance target detection data set in a balance state according to the preset number of amplified images which are larger than the expected balance threshold value in the initial images to be expanded.

In the specific embodiment of the invention, a large number of target examples suitable for copy expansion can be extracted by applying a non-overlapping principle and scale division according to the original labeling information of each initial image in the target detection data set, and an example pool can be formed, wherein the examples all contain 20% of original background information outside each example and are used for keeping invariance of fine-granularity background information during copying. And simultaneously generating a filling Mask _i for calculating a target spatial distribution index. The calculation process for generating the filling Mask _i and the target spatial distribution index is consistent with the process in step S102, and the embodiments of the present invention are not repeated here.

The proportion of the class and the scale of the copy instance can be controlled according to the class expansion ratio and the scale proportion rule to obtain a preset number of control target instances, the preset number of initial images to be expanded in the preset number of initial images can be determined by calculating the target space distribution index TSD of the Mask _i, the initial images with the target space distribution index TSD smaller than the preset threshold can be determined as the initial images to be expanded, the specific preset threshold can be set according to actual conditions, and the embodiment of the invention is not limited herein. If the initial image to be expanded is not needed, the process is ended, if the initial image to be expanded is needed, the scale proportion of the deletion of each initial image to be expanded is determined according to the original labeling information of the initial image, and then a background Region set Region which is not overlapped with the existing target instance can be generated, so that the regions which are needed to be expanded can be obtained according to the target space distribution index process, for example, the regions with the target space distribution index less than 1 are needed to be expanded. And then the preset number of to-be-expanded areas in the background area set can be determined, and the data global expansion is carried out on the preset number of to-be-expanded areas according to the target background-category association model (BCP), specifically, category prediction information can be acquired by using the background-category association model (BCP), and a high-score category exceeding a threshold value is selected as an association category of the background, so that the accuracy of the coarse-grained visual context is ensured.

The prediction process formula is:

（11）

where argmax () is a function of the maximum subscript.

And selecting the multi-scale target instance of the corresponding category in the instance pool for copying according to the association category. And randomly selecting one instance from the preset number of control target instances to zoom to the size of the background image, and keeping the original aspect ratio of the instance unchanged. Copying the adjusted instance into a background image to obtain a new target instance, adding annotation information into an initial image corresponding annotation file of a target detection data set, and simultaneously performing mask filling on the same region of the filling mask image to obtain an amplified image after amplification. The class ratio of the copies is also considered in the selection of instances. The data local amplification and the data global amplification are approximately the same, the data local amplification is to amplify the initial image according to the category and the background picture of each target instance to be amplified in each initial image, and the data global amplification is to process according to the category and the background picture of all the target instances in all the initial images and amplify.

Further, the target spatial distribution index of the amplified image may be calculated, the calculation process is consistent with the above, the embodiment of the present invention does not repeat, it is determined whether the target spatial distribution index of the amplified image is greater than the expected balance threshold, if not, the data global amplification is required to be performed again on the initial image corresponding to the amplified image, step S302 and the subsequent steps are performed again until the target spatial distribution index of the amplified image is greater than the expected balance threshold, it may be determined that the amplified image is in spatial balance, the processing processes of other initial images to be expanded are consistent, then the amplified image of each initial image to be expanded may be obtained, and further, a balanced target detection data set in a balanced state may be obtained, as shown in fig. 4, fig. 4 is a scale distribution adjustment contrast, and is data before scale balance, and the right graph is data after scale balance, and according to the left graph and the right graph, it may be seen that the distribution condition of the scale proportion in the target detection data set that is globally expanded by the target background-class association model is more reasonable, and according to the contrast condition may be seen that the spatial distribution of the target detection data set that is more evenly expanded by the target background-class association model.

According to the embodiment of the invention, the unit target instance is obtained through the original labeling information of each initial image in the target detection data set, and the effective target instance conforming to the visual context is added in the target detection data set through the target background-class association model (BCP), so that the global amplification of the target detection data set is realized, and the technical problem that the risk of damaging the context information of the data set exists in the target instance with semantic segmentation labeling by random copy and paste is solved. Further, by means of the category expansion ratio and the scale proportion rule, a background area set Region which is not overlapped with the existing target instance is generated, global amplification of the target detection data set is carried out according to the background area set Region, the problem that the background near the original target is focused in the prior art is solved, definition of the background existing in the non-target is too absolute, a large number of reasonable backgrounds meeting the condition are ignored, and meanwhile the unbalance problem is not solved.

In order to better implement the target detection data balancing method in the embodiment of the present invention, correspondingly, on the basis of the target detection data balancing method, the embodiment of the present invention further provides a target detection data balancing device, as shown in fig. 5, where the target detection data balancing device includes:

A data acquisition module 501 for acquiring a target detection data set; the target detection data set comprises a preset number of initial images and original annotation information corresponding to each initial image;

The data analysis module 502 is configured to analyze the target detection data set, and set a category expansion ratio, a scale proportion rule, and a target spatial distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on the initial image under the condition of spatial unbalance;

The image amplification module 503 is configured to perform data local amplification on a preset number of initial images according to the original labeling information to obtain a preset number of local amplified images, and train a preset background-class association model according to the preset number of local amplified images to obtain a target background-class association model;

The image balancing module 504 is configured to perform data global augmentation on the target detection data set according to the target background-category association model, the category augmentation ratio, the scale proportion rule, and the target spatial distribution index, so as to obtain a balanced target detection data set in a balanced state.

The target detection data balancing device provided in the foregoing embodiment may implement the technical solution described in the foregoing target detection data balancing method embodiment, and the specific implementation principle of each module or unit may refer to the corresponding content in the foregoing target detection data balancing method embodiment, which is not described herein again.

As shown in fig. 6, the present invention further provides an electronic device 600 accordingly. The electronic device 600 comprises a processor 601, a memory 602 and a display 603. Fig. 6 shows only a portion of the components of the electronic device 600, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.

The memory 602 may be an internal storage unit of the electronic device 600 in some embodiments, such as a hard disk or memory of the electronic device 600. The memory 602 may also be an external storage device of the electronic device 600 in other embodiments, such as a plug-in hard disk provided on the electronic device 600, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like.

Further, the memory 602 may also include both internal storage units and external storage devices of the electronic device 600. The memory 602 is used for storing application software and various types of data for installing the electronic device 600.

The processor 601 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 602, such as the object detection data balancing method of the present invention.

The display 603 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like in some embodiments. The display 603 is used for displaying information at the electronic device 600 and for displaying a visual user interface. The components 601-603 of the electronic device 600 communicate with each other via a system bus.

In some embodiments of the present invention, when the processor 601 executes the object detection data balancing program in the memory 602, the following steps may be implemented:

analyzing the target detection data set, and setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image; the target spatial distribution index is used for carrying out spatial balance on the initial image under the condition of spatial unbalance;

It should be understood that: the processor 601 may perform other functions in addition to the above functions when executing the object detection data balancing program in the memory 602, see in particular the description of the corresponding method embodiments above.

Further, the type of the electronic device 600 is not particularly limited, and the electronic device 600 may be a portable electronic device such as a mobile phone, a tablet computer, a personal digital assistant (personal digitalassistant, PDA), a wearable device, a laptop (laptop), etc. Exemplary embodiments of portable electronic devices include, but are not limited to, portable electronic devices that carry IOS, android, microsoft or other operating systems. The portable electronic device described above may also be other portable electronic devices, such as a laptop computer (laptop) or the like having a touch-sensitive surface, e.g. a touch panel. It should also be appreciated that in other embodiments of the invention, the electronic device 600 may not be a portable electronic device, but rather a desktop computer having a touch-sensitive surface (e.g., a touch panel).

Correspondingly, the embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium is used for storing a computer readable program or instruction, and when the program or instruction is executed by a processor, the method steps or functions of the target detection data balancing method provided by the above method embodiments can be realized.

Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program stored in a computer readable storage medium to instruct related hardware (e.g., a processor, a controller, etc.). The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above detailed description of the method and apparatus for balancing target detection data provided by the present invention applies specific examples to illustrate the principles and embodiments of the present invention, and the above examples are only used to help understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims

1. A method of balancing target detection data, comprising:

performing data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state;

the analyzing the target detection data set, setting a category expansion ratio, a scale proportion rule and a target space distribution index corresponding to each initial image, includes:

Determining a filling mask map according to the original labeling information corresponding to each initial image, and analyzing the filling mask map to obtain a target space distribution index corresponding to each initial image;

the step of carrying out data local amplification on the preset number of initial images according to the original labeling information to obtain the preset number of local amplified images, comprises the following steps:

Filling the corresponding initial images according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain local amplified images corresponding to each initial image;

The step of performing data global augmentation on the target detection data set according to the target background-category association model, the category augmentation ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state, comprises the following steps:

2. The method for balancing target detection data according to claim 1, wherein the analyzing the filling mask map to obtain the target spatial distribution index corresponding to each initial image includes:

3. The method of claim 1, wherein the filling the corresponding initial image according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain a local amplified image corresponding to each initial image comprises:

4. The method of claim 3, wherein obtaining the local amplification image corresponding to each initial image according to the new initial image comprises:

5. The method of claim 1, wherein training a preset background-class association model according to the preset number of local amplification images to obtain a target background-class association model comprises:

6. The method for balancing object detection data according to claim 5, wherein the obtaining the extended background image corresponding to each initial image according to the background images of the preset number of placement positions of each object instance to be amplified in each initial image includes:

7. An object detection data balancing apparatus, comprising:

the image balancing module is used for carrying out data global amplification on the target detection data set according to the target background-category association model, the category expansion ratio, the scale proportion rule and the target space distribution index to obtain a balanced target detection data set in a balanced state;

The data analysis module is further used for analyzing the target detection data set to obtain class distribution conditions of all the examples, determining expansion probability according to differences among the number of examples of each class in the class distribution conditions, and further determining class expansion ratio according to the expansion probability; performing scale analysis on all original marked information in the target detection data set to obtain scale distribution conditions, and determining scale proportion rules according to the scale distribution conditions; determining a filling mask map according to the original labeling information corresponding to each initial image, and analyzing the filling mask map to obtain a target space distribution index corresponding to each initial image;

The image amplification module is further configured to perform instance screening on all instances in the filling mask map of each initial image to obtain non-overlapping target instances to be amplified, and acquire the target instances to be amplified, which contain background information, in each initial image, so as to obtain a label information set corresponding to each initial image; the annotation information set comprises all target instances to be augmented and the corresponding original annotation information; performing spatial gesture transformation on all the target instances to be amplified in the annotation information set, correspondingly adjusting original annotation information corresponding to each target instance to be amplified in the annotation information set, and obtaining a transformation annotation information set corresponding to each initial image; filling the corresponding initial images according to each transformation target instance to be amplified in the transformation annotation information set of each initial image to obtain local amplified images corresponding to each initial image;

The image balancing module is further used for obtaining a preset number of target examples suitable for copying expansion according to the original labeling information of each initial image in the target detection data set; controlling the category and scale proportion of the preset number of target examples according to the category expansion ratio and the scale proportion rule to obtain the preset number of control target examples, and determining the initial images with the target space distribution indexes smaller than a preset threshold value as initial images to be expanded, so as to obtain the preset number of initial images to be expanded; obtaining a background area set which is not overlapped with the preset number of control target examples in each initial image to be expanded according to the scale proportion of the missing of each initial image to be expanded; determining a preset number of regions to be expanded in the background region set according to the target space distribution index; performing data global amplification on the preset number of areas to be amplified according to the target background-category association model to obtain amplified images after the amplification of each initial image to be amplified; judging whether the target spatial distribution index of the amplified image is larger than an expected balance threshold value or not; if not, continuing to perform data global amplification on the initial image to be expanded corresponding to the amplified image; if yes, obtaining a balance target detection data set in a balance state according to the amplified images, which are larger than the expected balance threshold, in the preset number of initial images to be expanded.