CN113537410A - Universal automatic balancing method for deep learning positive samples - Google Patents

Universal automatic balancing method for deep learning positive samples Download PDF

Info

Publication number
CN113537410A
CN113537410A CN202111071518.0A CN202111071518A CN113537410A CN 113537410 A CN113537410 A CN 113537410A CN 202111071518 A CN202111071518 A CN 202111071518A CN 113537410 A CN113537410 A CN 113537410A
Authority
CN
China
Prior art keywords
image
small
samples
sample
contrast
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111071518.0A
Other languages
Chinese (zh)
Other versions
CN113537410B (en
Inventor
都卫东
王岩松
王天翔
吴健雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focusight Technology Co Ltd
Original Assignee
Focusight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focusight Technology Co Ltd filed Critical Focusight Technology Co Ltd
Priority to CN202111071518.0A priority Critical patent/CN113537410B/en
Publication of CN113537410A publication Critical patent/CN113537410A/en
Application granted granted Critical
Publication of CN113537410B publication Critical patent/CN113537410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a universal deep learning positive sample automatic balancing method, which comprises the steps of S1, determining the sequence of traversing and processing all pictures according to sample subset distribution; s2, selecting the best channel for the image; s3, performing sliding window screenshot on the optimal channel, calculating the attribute of each small image obtained by screenshot, and classifying according to the attribute; s4, determining the number of the small images intercepted from each large image according to the distribution of the sample subsets and the total number of the required samples; s5, balancing the required quantity according to the classified quantity ratio; s6, selecting samples according to the required quantity; and S7, sending the obtained samples consistent with the demand number into a neural network for training, and sending the samples into the neural network for training. The method solves the problem of high over-killing rate of the trained network caused by uneven selection and distribution of the positive samples, has the characteristics of universality and no need of manual intervention, realizes complete automatic execution and plays a role.

Description

Universal automatic balancing method for deep learning positive samples
Technical Field
The invention relates to the technical field of image visual detection, in particular to a universal automatic deep learning positive sample equalization method.
Background
Because the resolution ratio of the industrial detection image is too large, the industrial detection image cannot be directly sent to a neural network when deep learning is used, the industrial detection image needs to be matched with the traditional algorithm, suspicious defects are extracted by the traditional algorithm, then a cutting operation with a certain size is carried out by taking the suspicious defects as a center, and the obtained defect minimaps are sent to the network for further judgment; the above is the inference phase, i.e. the use phase of the neural network.
In the training stage, according to manual labeling of the image, a cutting operation with the labeling as the center is carried out to obtain a negative sample; when selecting a positive sample, the most original method is to directly perform window sliding cutting on a good image, and all small images obtained by sliding cutting are taken as the positive sample.
Later, in order to avoid the number unevenness of the positive and negative samples, the required number n of the positive samples is determined by a method of changing the number n to 1(n is generally 3), coordinates are randomly selected on a good image, and n small positive sample images are cut.
However, in the scheme of cutting the image through the sliding window and taking all the obtained small images as positive samples, the number of defects is small in a real application environment, and the number of the positive samples obtained by the method is large, so that the number of the positive samples and the number of the negative samples are unbalanced, and a high false drop rate is caused.
The method inhibits the problem of unbalance of the positive and negative samples in quantity, but introduces a new problem:
for most industrial product images, such as glass or mobile phone back shells, when the surface of a product has no defect, the region located at the non-edge is often a flat region, that is, the gray scale aberration is not large, and the number of samples of the type is most; the gray level difference of the edge area is larger, the contrast is higher, the shape difference between the samples in the class is larger than that of the flat area, the diversity requirement of the samples in theory is met, and the occupied quantity is small; this results in that, if a random selection method is used, the probability that the edge area sample is selected is small, the probability that the edge area sample is missed is high, and even if the edge area sample is selected, the number of the edge area sample is small, and finally the trained network is over killed in the edge area.
For the above problems, an equalization scheme based on coordinate position distribution can be generally designed for different tested products and different imaging effects, but a design algorithm needs to be adopted for specific products, namely imaging effects, which cannot be used universally, an adjustment threshold needs to be adopted, and automation cannot be realized.
Moreover, if the industrial detection images are often acquired on different detection devices, the imaging conditions of different devices are often not completely the same, and there are some differences, and if the positive samples are all captured from one image or several images on a single device, the samples under the imaging conditions on other devices are lacked in the positive samples, so that the trained network normally works on the device on which the positive sample is selected, and higher over-killing occurs on other devices.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method solves the problems that in the industrial detection image processing, when a positive sample is sampled, the phenomena that the selected positive sample lacks diversity, and the coverage degree and the representativeness are not enough are likely to occur, so that the network over-killing rate trained by the sample is high.
The technical scheme adopted by the invention for solving the technical problems is as follows: a universal deep learning positive sample automatic equalization method comprises the following steps,
s1, numbering a plurality of images respectively collected on a plurality of machines to form a sample subset, and determining the sequence of traversing all the images according to the distribution of the sample subset;
s2, selecting the best channel for the image;
s3, performing sliding window screenshot on the optimal channel, calculating the attribute of each small image obtained by screenshot, and classifying according to the attribute;
s4, determining the number of the small images intercepted from each large image according to the distribution of the sample subsets and the total number of the required samples;
s5, according to the quantity ratio of the classification, suppressing the classification with low relative contrast in the classification with low contrast, increasing the classification with high relative contrast and the classification with high contrast in the classification with low contrast according to an enhancement strategy, and finally determining the quantity of each small graph intercepted on each large graph;
s6, determining the required number of each type of small images in each large image, selecting the small images obtained by sliding a window on the optimal channel according to a connected domain balance selection strategy, mapping the screenshot window to the original image of the large image, and cutting the small images at the window positions on the original image to be used as samples;
s7, sending the obtained sample consistent with the demand number into a neural network for training, and sending the sample into the neural network for training;
and S8, selecting a small picture from all the small pictures which are not selected according to the established completion strategy so as to reach the required sample number.
The method classifies the positive samples obtained by the window sliding cutting by using the binary tree combined with the classification characteristics, and then selects the obtained samples according to the classification scheme by using balanced selection, so that the diversity of the positive samples is ensured, the positive samples with less quantity under a certain characteristic form are not omitted, and the over-killing rate of the trained network is reduced; the method solves the problem of high trained network over-killing rate caused by uneven positive sample selection distribution, has the characteristics of universality and no need of manual intervention, realizes complete automatic execution, and plays a role.
The invention has the beneficial effects that:
1. aiming at the problem that if random sampling edge regions are possibly omitted, the algorithm cannot be designed according to the coordinate-based equalization scheme, the method is not universal, the method is based on a binary tree combined with contrast and average gray scale, samples are classified, the classification and the equalization are carried out based on the characteristics instead of coordinate distribution, and therefore the algorithm has universality;
2. for the problem that the algorithm based on the feature classification needs to adjust the threshold value and cannot perform automatic classification and balance, the invention adopts the method based on the feature distribution statistics and combined with an unsupervised automatic classifier to automatically select the optimal image channel and determine the optimal classification threshold value, so that the algorithm has the characteristics of automatically determining the threshold value and automatically optimizing;
3. aiming at the problem that the network over-killing height can be trained due to the fact that a positive sample is selected on a single image or single equipment under different equipment and different imaging conditions, the method and the device adopt the scheme that images obtained on different equipment are divided into respective sample subsets, and balanced selection among the sample subsets is adopted, so that the problem is solved.
Drawings
Fig. 1 is a traversal sampling graph cutting mode of a large graph sample.
FIG. 2 is the logic and steps of sample thumbnail classification.
FIG. 3 is a schematic diagram of an optimal channel selection method when the image is a color image.
Detailed Description
The invention will now be described in further detail with reference to the drawings and preferred embodiments. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.
A general method for automatic equalization of deep learning positive samples as shown in fig. 1-3, comprises the steps of,
s1, numbering a plurality of images respectively collected on a plurality of machines to form a sample subset, and determining the sequence of traversing all the images according to the distribution of the sample subset;
s2, selecting the best channel for the image;
s3, performing sliding window screenshot on the optimal channel, calculating the attribute of each small image obtained by screenshot, and classifying according to the attribute;
s4, determining the number of the small images intercepted from each large image according to the distribution of the sample subsets and the total number of the required samples;
s5, according to the quantity ratio of the classification, suppressing the classification with low relative contrast in the classification with low contrast, increasing the classification with high relative contrast and the classification with high contrast in the classification with low contrast according to an enhancement strategy, and finally determining the quantity of each small graph intercepted on each large graph;
s6, determining the required number of each type of small images in each large image, selecting the small images obtained by sliding a window on the optimal channel according to a connected domain balance selection strategy, mapping the screenshot window to the original image of the large image, and cutting the small images at the window positions on the original image to be used as samples;
s7, sending the obtained sample consistent with the demand number into a neural network for training, and sending the sample into the neural network for training;
and S8, selecting a small picture from all the small pictures which are not selected according to the established completion strategy so as to reach the required sample number.
The following examples are given.
The method comprises the steps that N machines are arranged, a certain number of images can be shot on each machine, the number of the images shot on each machine is possibly different, all the machines are numbered, the images are respectively numbered as 1,2,3,. N, the number of the machine is i (i =1,2,3,4,5 … N), all the images shot by each machine are numbered, and the number of the jth image in the current machine i is i _ j; i (i =1,2,3,4,5 … N) can be considered as a subset.
1. Traversal processing mode
As shown in fig. 1, for traversing the sequence of processing all sample images, according to the sequence numbers shown in fig. 1,2,3, and …, traversing starts from the horizontal direction, traverses the current row, and traverses the next row after processing each image of the current row;
first processing a first image of a first machine, then processing a first image of a second machine, and then processing a first image of a third machine until the first image of the Nth (i.e. the last) machine; this is a cycle;
then processing a second image of the first machine, then processing each image on each machine in sequence according to the traversal sequence;
the processing refers to the step of intercepting the small image sample from the sample image, and the detailed strategy of the interception is in the later step.
2. Selecting an optimal channel:
(1) if the image is a gray image, directly taking the original image as an optimal channel;
(2) if the image is a color image, as shown in fig. 1, a first round of loop is performed in sequence, and the following operations are performed on each image traversed in the loop:
before the cyclic capture shown in fig. 3 starts, first traversing the first image of each machine, namely executing the first cycle twice, before the capture processing cycle starts, traversing the first image according to the sequence of the first cycle, not capturing the first image, only making data statistics, and selecting the optimal channel according to the statistical data;
the specific method comprises the following steps: and (3) performing channel decomposition on each image in the traversal (the number of subsets is 3, and the first round of circulation corresponds to 3 images in the graph) to obtain: red channel R, blue channel G, green channel B, color transition Gray scale graph Gray = R0.299 + G0.587 + B0.114, saturation channel S, luminance channel V.
And binarizing images of Gray, R, G, B, S and V, 6 channels decomposed from each big image by using an Otsu method, and subtracting the Gray level of a bright area from the average value of the Gray level of a dark area to obtain the contrast of each channel.
As shown in fig. 3, the channel with the largest contrast is selected from the images of 6 channels, i.e. Gray, R, G, B, S and V, which are decomposed from each map, and is denoted as Idexi, i is the number of the subset (machine); and obtaining the channel with the maximum contrast corresponding to each big image in the first round of cycle, namely Idex1, Idex2 and Idex3, and then selecting the channel selected the most times as the best channel, namely Idex.
3. Classification scheme
After the selection of the optimal channel is finished, according to the traversal processing mode of step 1, formally starting the traversal screenshot shown in fig. 1 and classifying.
As shown in fig. 2, in each cycle as shown in fig. 1, the contrast threshold Tc, the average grayscale threshold Tg, the polymerization degree threshold Tp, the boundary entropy threshold Te, the dark contrast threshold Tl, and the bright contrast threshold Th are obtained by statistics, as shown in fig. 2.
Samples classified into categories of low contrast and low gray scale are classified again into categories of low relative contrast and high relative contrast, and Tl is a threshold value of the categories;
similarly, samples classified into categories of low contrast and high gray scale are classified again into categories of low relative contrast and high relative contrast, and Th is a threshold value of the classification;
classifying the small graphs cut from each large graph, wherein the threshold is obtained by statistics and using an automatic classifier, and the specific obtaining method comprises the following steps:
(1) traversing the current round, namely traversing the current row as shown in fig. 1, sliding windows with the height of H and the width of W in a form of first-column and second-column from the upper left corner to the lower right corner of each large graph with H/2 and W/2 as longitudinal and transverse step lengths according to the size of the small graph required by the training of the neural network, and sequentially cutting out the small graphs; calculating the attribute value of the small graph for classification, wherein the specific calculation mode is as follows:
contrast c:
before sliding, carrying out binarization on a large image to be cut by sliding at present by using an Otsu method to obtain a bright area with higher gray level and a dark area with lower gray level on the large image, wherein a small image cut when a window slides corresponds to the large image through a window coordinate position to obtain the corresponding bright area and dark area on the large image, and the calculation mode of c is as follows:
if the large graph position corresponding to the small graph window only has a bright area or only has a dark area, c = 0;
if the light area and the dark area exist, c = MeanLight-MealDark, wherein MeanLight and MealDark are the average gray levels of the light area and the dark area of the large image position corresponding to the small image window respectively;
average gray g:
i.e. average gray scale of the truncated small graph;
degree of polymerization p:
if the large graph position corresponding to the small graph window only has a bright area or only has a dark area, the polymerization degree is 1;
if the large graph position corresponding to the small graph window has both a bright area and a dark area, the bright area and the dark area are divided into connected areas respectively; the area of each connected domain of the bright area is respectively calculated as follows: AreaLight = [ AreaLight1, AreaLight2, AreaLight3. ], and area of each dark-area connected domain: AreaDarks = [ AreaDarks1, AreaDarks2, AreaDarks3. ], then according to the formula: p = min (max (arealights), max (areadarks))/s;
respectively taking the maximum value of the areas of all connected domains in the dark region and all connected domains in the bright region in the small image, comparing the two maximum values, taking the small value, and dividing the small value by the area of the small image to obtain the polymerization degree p; where s refers to the area of the sample panel s = W × H.
Boundary entropy e:
if the large graph position corresponding to the small graph window only has a bright area or only has a dark area, e = 0;
if the large picture position corresponding to the small picture window has both bright area and dark area, then
Figure 700024DEST_PATH_IMAGE001
Wherein ig refers to the value range of the pixel gray value of 0-255,
Figure 452079DEST_PATH_IMAGE002
the ratio of the number of pixels with the gray scale of ig in the boundary area to the total number of pixels in the boundary area, wherein the boundary area refers to the boundary between the dark area and the two areas, and the ratio is defined by r =
Figure 274542DEST_PATH_IMAGE003
Is a region with a radius and outward expansion;
relative contrast:
dividing the gray scale into a sample relative contrast cl with low average gray scale and a sample relative contrast ch with high average gray scale;
the following calculation method is adopted for both cl and ch:
carrying out Dajin threshold segmentation on the small image, wherein if the segmentation result only has a bright area or only has a dark area, the relative contrast is 0;
if the division result has both bright and dark regions, the relative contrast is the difference between the average gray levels of the bright and dark regions.
The complementary polymerization operation method comprises the following steps:
in fig. 2, the complementary polymerization operation is performed on the low-polymerization-degree class, and the specific steps are as follows: the method comprises the steps of dividing high-contrast classes into classes with low polymerization degrees, comparing the areas of the smallest connected domains in a bright area and a dark area, taking the center of the connected domain with the large area as an origin and the center of the connected domain with the small area as an end point, taking a translation vector, sliding a screenshot window corresponding to a current small picture in the vector direction by taking 5 as a step length and taking min (W, H)/2 as a boundary, if the polymerization degree in the sliding process exceeds 0.3, replacing the original small picture with the small picture intercepted by the current window, and if the polymerization degree of no small picture exceeds 0.3 after sliding is completed, replacing the original small picture with the one with the largest polymerization degree in all small pictures obtained in the sliding process.
(2) After the classification attribute values of all the thumbnails in the current round are obtained according to the method, all the sample attribute values to be classified are sent to a smooth histogram classifier for classification according to the logic of the binary tree in fig. 2, and a specific calculation method for obtaining the classification threshold value of the currently used classification attribute is as follows:
normalizing the attribute value to be within a range of 0-255, determining a classification threshold value by using a Gaussian smooth histogram method, and then mapping the determined threshold value within the range of 0-255 to be within an original range to obtain the classification threshold value of the attribute;
all calculations in this step are performed on the best channel selected in step 2.
4. Demand quantity allocation
Setting the number of demands given by network training as N, that is, a total of N sample minimaps are required, the number of subsets of the large map obtained from different devices is N _ sub (e.g., N _ sub =3 in fig. 1), and the number of images in the subset with the least number of images in all subsets is N _ min (e.g., N _ min =3 in fig. 1);
in the loop shown in fig. 1, the number of small samples to be taken on each large map is need to be new _ image = N/(N _ sum × N _ min); on the large graph currently circulated to, as shown in fig. 2, the required number need _ i = need _ image/4 of class 1 (average gray level is low), class 2 (average gray level is high), class 3 (boundary entropy is low), class 4 (boundary entropy is high) sample small graphs;
5. demand quantity balancing
Taking class 1 as an example, according to the above method and the description of the step "3, classification scheme", class 1 is classified into class 1.1 (low relative contrast) and class 1.2 (high relative contrast) by using Tl;
wherein the class 1 corresponds to the class with low contrast and low gray level in the step "3, classification scheme", the samples classified into the classes with low contrast and low gray level mentioned in the step "3, classification scheme" are again classified into the classes with low relative contrast and high relative contrast, the former represents the class 1.1, and the latter corresponds to the class 1.2;
the current large graph number is n1 and n2 respectively, the proportion is n2/n1, the large graphs are distributed to classes 1.1 and 1.2 according to the opposite proportion of the requirement number of the class 1, and each class takes at least 1, namely need _1.1= max (1, need _ 1(n 2/(n1+ n2))), need _1.2= max (1, need _ 1(n 1/(n1+ n 2));
where need _1 is the number of class 1 requests on the current big graph, need _1.1 and need _1.2 are the numbers allocated to class 1.1 and class 1.2.
Let the number of classes 1.1 and 1.2 on the current big graph be n _1.1 and n _1.2 respectively,
if n _1.2< need _1.2, then the deficit numbers are assigned to class 3 and class 4,
namely, it is
need_3=need_3+[(need_1-(need_1.1+n_1.2)/2];
need_4=need_4+[(need_1-(need_1.1+n_1.2)/2]。
Similarly, class 2.1 and class 2.2 in class 2 correspond to class 2 with a lower relative contrast than the relative contrast divided by Th in step "3 and classification scheme", respectively, and the required number is balanced according to the above method.
6. Selecting samples by required number
The screenshot saved in the step is obtained by performing screenshot operation on the large map original image.
The required number of each type of samples on each large graph in the cycle is determined according to the method, and then the small graphs of the samples classified well are selected according to the required number, wherein the selection method comprises the following steps:
(1) if the number of the small pattern samples of the current class in the current image has _ i is less than or equal to the required number need _ i of the class in the current image, i represents the class, i =1.1,1.2,2.1,2.2,3, 4; all the small pictures of the type on the image are directly selected and stored;
(2) if the wave _ i > need _ i, selecting according to the following method:
the positions of all current image windows on the current image are mapped to the positions on the large image, namely, the windows form an area on the large image in a rectangular mode, and then the obtained area is processed by a connected domain.
And (4) setting n _ c connected domains, sequentially traversing the 1 st, 2 nd and … th n _ c connected domains, randomly selecting a sample on the traversed current connected domain, stopping the whole sample balancing step if the selected number reaches the required number in the midway, and sending the obtained sample consistent with the required number into a neural network for training.
And if the number of the selected samples in the first cycle is less than the required number, sequentially traversing the 1 st, 2 nd and … n _ c connected domains, randomly selecting an unselected sample on the traversed current connected domain, and skipping the connected domain if all the samples on the connected domain are selected.
And sequentially selecting samples according to the method of traversing and randomly selecting the connected domain, and exiting the cycle if the number of the selected samples reaches the required number of the class on the image.
7. Deficiency and complement treatment method
The deficiency treatment method comprises the following steps:
in the above cycle, if the number of samples of the current class of the current image is less than the number of samples of the class required by the current image, all the samples of the class on the current image are selected.
The completion processing method comprises the following steps:
as shown in fig. 1, all the cycles are completed, the number N _ select of the selected thumbnails fails to reach the total demand N, and when the total number N _ reduce of all the unselected samples in the statistical class 1.2, class 2.2, class 3, and class 4 cannot or just can complement the total demand, that is, N _ reduce < = N-N _ select, all the unselected thumbnails of the classes are selected, and the selected thumbnails are combined into the finally selected samples.
If N _ reduce > N-N _ select, sequentially traversing class 1.2, class 2.2, class 3 and class 4, randomly selecting a small image sample from the small image samples which are not selected in the current class, repeating the circulation until the total selection number reaches the total demand number, and stopping the circulation if the total selection number (namely the total number of the small image samples selected in the complement process and the above process) reaches the total demand number in the midway.
While particular embodiments of the present invention have been described in the foregoing specification, various modifications and alterations to the previously described embodiments will become apparent to those skilled in the art from this description without departing from the spirit and scope of the invention.

Claims (9)

1. A universal deep learning positive sample automatic equalization method is characterized in that: comprises the following steps of (a) carrying out,
s1, numbering a plurality of images respectively collected on a plurality of machines to form a sample subset, and determining the sequence of traversing all the images according to the distribution of the sample subset;
s2, selecting the best channel for the image;
s3, performing sliding window screenshot on the optimal channel, calculating the attribute of each small image obtained by screenshot, and classifying according to the attribute;
s4, determining the number of the small images intercepted from each large image according to the distribution of the sample subsets and the total number of the required samples;
s5, according to the quantity ratio of the classification, suppressing the classification with low relative contrast in the classification with low contrast, increasing the classification with high relative contrast and the classification with high contrast in the classification with low contrast according to an enhancement strategy, and finally determining the quantity of each small graph intercepted on each large graph;
s6, determining the required number of each type of small images in each large image, selecting the small images obtained by sliding a window on the optimal channel according to a connected domain balance selection strategy, mapping the screenshot window to the original image of the large image, and cutting the small images at the window positions on the original image to be used as samples;
and S7, sending the obtained samples consistent with the demand number into a neural network for training, and sending the samples into the neural network for training.
2. A general method of automatic equalization of deep learning positive samples according to claim 1, characterized by: the method also comprises the step of carrying out the following steps,
and S8, selecting a small picture from all the small pictures which are not selected according to the established completion strategy so as to reach the required sample number.
3. A general method of automatic equalization of deep-learning positive samples according to claim 1 or 2, characterized by: in step S2, if the image is a grayscale image, the image itself is used as the best channel; if the image is a color image, performing channel decomposition on the image to obtain 6 images of a color-to-gray image, red, green, blue, saturation and brightness, selecting an image with the highest brightness-to-dark contrast as an optimal channel from the 6 images obtained by the image decomposition by adopting a method of obtaining brightness-to-dark contrast based on global binarization, counting which channel is obtained by all images subjected to the optimal channel selection, and using the channel selected most as the optimal channel.
4. A general method of automatic equalization of deep-learning positive samples according to claim 1 or 2, characterized by: in step S3, sliding windows with height H and width W in a row from the top left corner to the bottom right corner of each large graph with H/2 and W/2 as longitudinal and transverse steps according to the small graph size required by the training neural network, and cutting out small graphs in sequence; calculating attribute values of the small graphs for classification; and then sending all sample attribute values to be classified into a smooth histogram classifier for classification according to the logic of the binary tree.
5. A general method of automatic equalization of deep learning positive samples according to claim 4, characterized by: the attribute values of the small graph for classification comprise contrast c, average gray g, polymerization degree p and boundary entropy e; the threshold values of the classified attribute values are respectively as follows: a contrast threshold Tc, an average grayscale threshold Tg, a polymerization degree threshold Tp, a boundary entropy threshold Te, a dark contrast threshold Tl, a bright contrast threshold Th.
6. A general method of automatic equalization of deep learning positive samples according to claim 5, characterized by: the method has a complementary polymerization step for the low-polymerization-degree class, and comprises the following specific steps:
dividing the high-contrast class into classes with low polymerization degrees, comparing the areas of the smallest connected domains in the bright region and the dark region, and taking the center of the connected domain with the large area as an origin and the center of the connected domain with the small area as an end point to be used as a translation vector; in the vector direction, a screenshot window corresponding to the current small image slides by taking a set distance as a step length and taking min (W, H)/2 as a boundary, and if the polymerization degree exceeds a set value in the sliding process, replacing the original small image with the small image intercepted by the current window; and if the sliding is finished and the polymerization degree of no small graph exceeds a set value, replacing the original small graph with the one with the largest polymerization degree in all the small graphs obtained in the sliding process.
7. A general method of automatic equalization of deep-learning positive samples according to claim 1 or 2, characterized by: in step S6, the sorted sample thumbnails are selected according to the required number, and the selection method includes:
1) if the number of the small pattern samples of the current class in the current image has _ i is less than or equal to the required number need _ i of the class in the current image, i represents the class, i =1.1,1.2,2.1,2.2,3, 4; all the small pictures of the type on the image are directly selected and stored;
2) if have _ i > need _ i, then map the positions of all current class image windows on the current image to the positions on the large map, that is, make up these windows into the area on a large map in the form of rectangle, and then make the obtained area process the connected domain.
8. A general method of automatic equalization of deep learning positive samples according to claim 7, characterized by: in the step 2), n _ c connected domains are set, the 1 st, 2 nd and … th connected domains are sequentially traversed, a sample is randomly selected from the traversed current connected domain, if the selected number reaches the required number in the midway, the whole sample balancing step is stopped and ended, and the obtained sample consistent with the required number is sent to a neural network for training; if the number of the selected samples is less than the required number, sequentially traversing the 1 st, 2 nd and … n _ c connected domains again, and randomly selecting an unselected sample on the traversed current connected domain; if all samples on the connected component have been selected, the connected component is skipped.
9. A general method of automatic equalization of deep learning positive samples according to claim 2, characterized by: in step S8, if the number N _ select of the selected thumbnails fails to reach the total demand N, then it is counted whether the total number N _ reduce of all unselected samples in the classes with high relative contrast, low boundary entropy and high boundary entropy cannot or just can complement the total demand, i.e. whether N _ reduce < = N-N _ select is present, if yes, all the unselected thumbnails are selected, and the selected thumbnails are combined into the finally selected sample; if not, successively traversing the classes with high relative contrast, low boundary entropy and high boundary entropy, randomly selecting a small image sample from the small image samples which are not selected in the current class, repeating the circulation until the total selection number reaches the total demand number, and stopping the circulation if the total selection number, namely the total number of the small image samples selected in the complement process and the above processes, reaches the total demand number N in the midway.
CN202111071518.0A 2021-09-14 2021-09-14 Universal automatic balancing method for deep learning positive samples Active CN113537410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111071518.0A CN113537410B (en) 2021-09-14 2021-09-14 Universal automatic balancing method for deep learning positive samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111071518.0A CN113537410B (en) 2021-09-14 2021-09-14 Universal automatic balancing method for deep learning positive samples

Publications (2)

Publication Number Publication Date
CN113537410A true CN113537410A (en) 2021-10-22
CN113537410B CN113537410B (en) 2021-12-07

Family

ID=78092468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111071518.0A Active CN113537410B (en) 2021-09-14 2021-09-14 Universal automatic balancing method for deep learning positive samples

Country Status (1)

Country Link
CN (1) CN113537410B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612799A (en) * 2022-03-11 2022-06-10 应急管理部国家自然灾害防治研究院 Space self-adaptive positive and negative sample generation method and system based on landslide/non-landslide area ratio

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103996050A (en) * 2014-05-08 2014-08-20 清华大学深圳研究生院 Guard net detection method based on Fourier spectrum under polar coordinates
CN106022338A (en) * 2016-05-23 2016-10-12 麦克奥迪(厦门)医疗诊断系统有限公司 Automatic ROI (Regions of Interest) detection method of digital pathologic full slice image
CN106898035A (en) * 2017-01-19 2017-06-27 博康智能信息技术有限公司 A kind of dress ornament sample set creation method and device
CN107480628A (en) * 2017-08-10 2017-12-15 苏州大学 A kind of face identification method and device
CN107679074A (en) * 2017-08-25 2018-02-09 百度在线网络技术(北京)有限公司 A kind of Picture Generation Method and equipment
CN109711228A (en) * 2017-10-25 2019-05-03 腾讯科技(深圳)有限公司 A kind of image processing method that realizing image recognition and device, electronic equipment
CN111199214A (en) * 2020-01-04 2020-05-26 西安电子科技大学 Residual error network multispectral image ground feature classification method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103996050A (en) * 2014-05-08 2014-08-20 清华大学深圳研究生院 Guard net detection method based on Fourier spectrum under polar coordinates
CN106022338A (en) * 2016-05-23 2016-10-12 麦克奥迪(厦门)医疗诊断系统有限公司 Automatic ROI (Regions of Interest) detection method of digital pathologic full slice image
CN106898035A (en) * 2017-01-19 2017-06-27 博康智能信息技术有限公司 A kind of dress ornament sample set creation method and device
CN107480628A (en) * 2017-08-10 2017-12-15 苏州大学 A kind of face identification method and device
CN107679074A (en) * 2017-08-25 2018-02-09 百度在线网络技术(北京)有限公司 A kind of Picture Generation Method and equipment
CN109711228A (en) * 2017-10-25 2019-05-03 腾讯科技(深圳)有限公司 A kind of image processing method that realizing image recognition and device, electronic equipment
CN111199214A (en) * 2020-01-04 2020-05-26 西安电子科技大学 Residual error network multispectral image ground feature classification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴春志 等: "一种有效的不均衡样本生成方法及其在行星变速箱故障诊断中的应用", 《兵工学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612799A (en) * 2022-03-11 2022-06-10 应急管理部国家自然灾害防治研究院 Space self-adaptive positive and negative sample generation method and system based on landslide/non-landslide area ratio
CN114612799B (en) * 2022-03-11 2022-09-16 应急管理部国家自然灾害防治研究院 Space self-adaptive positive and negative sample generation method and system based on landslide/non-landslide area ratio

Also Published As

Publication number Publication date
CN113537410B (en) 2021-12-07

Similar Documents

Publication Publication Date Title
US20190362484A1 (en) Patch selection for neural network based no-reference image quality assessment
CN107871316B (en) Automatic X-ray film hand bone interest area extraction method based on deep neural network
CN110619333B (en) Text line segmentation method, text line segmentation device and electronic equipment
CN103971134B (en) Image classification, retrieval and bearing calibration, and related device
CN108664839B (en) Image processing method and device
CN111695373B (en) Zebra stripes positioning method, system, medium and equipment
CN109740721A (en) Wheat head method of counting and device
CN113792827B (en) Target object recognition method, electronic device, and computer-readable storage medium
CN109035254A (en) Based on the movement fish body shadow removal and image partition method for improving K-means cluster
US20190272627A1 (en) Automatically generating image datasets for use in image recognition and detection
CN108241821A (en) Image processing equipment and method
CN113537410B (en) Universal automatic balancing method for deep learning positive samples
CN110837809A (en) Blood automatic analysis method, blood automatic analysis system, blood cell analyzer, and storage medium
CN114581723A (en) Defect classification method, device, storage medium, equipment and computer program product
CN109509188A (en) A kind of transmission line of electricity typical defect recognition methods based on HOG feature
CN113222959A (en) Fresh jujube wormhole detection method based on hyperspectral image convolutional neural network
CN112347805A (en) Multi-target two-dimensional code detection and identification method, system, device and storage medium
CN108305270A (en) A kind of storage grain worm number system and method based on mobile phone photograph
CN111046782A (en) Fruit rapid identification method for apple picking robot
CN110826571A (en) Image traversal algorithm for image rapid identification and feature matching
CN111667509B (en) Automatic tracking method and system for moving target under condition that target and background colors are similar
CN106920266A (en) The Background Generation Method and device of identifying code
CN107545565A (en) A kind of solar energy half tone detection method
CN108711139B (en) One kind being based on defogging AI image analysis system and quick response access control method
CN115546141A (en) Small sample Mini LED defect detection method and system based on multi-dimensional measurement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Universal Deep Learning Positive Sample Automatic Equalization Method

Effective date of registration: 20231226

Granted publication date: 20211207

Pledgee: Industrial and Commercial Bank of China Changzhou Wujin Branch

Pledgor: FOCUSIGHT TECHNOLOGY Co.,Ltd.

Registration number: Y2023980074344

PE01 Entry into force of the registration of the contract for pledge of patent right