CN113743470B - AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box - Google Patents

AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box Download PDF

Info

Publication number
CN113743470B
CN113743470B CN202110892805.1A CN202110892805A CN113743470B CN 113743470 B CN113743470 B CN 113743470B CN 202110892805 A CN202110892805 A CN 202110892805A CN 113743470 B CN113743470 B CN 113743470B
Authority
CN
China
Prior art keywords
garbage
frame
loss
target
centroid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110892805.1A
Other languages
Chinese (zh)
Other versions
CN113743470A (en
Inventor
鲍承德
黄正
陈洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lianyun Environment Engineering Co ltd
Original Assignee
Zhejiang Lianyun Environment Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lianyun Environment Engineering Co ltd filed Critical Zhejiang Lianyun Environment Engineering Co ltd
Priority to CN202110892805.1A priority Critical patent/CN113743470B/en
Publication of CN113743470A publication Critical patent/CN113743470A/en
Application granted granted Critical
Publication of CN113743470B publication Critical patent/CN113743470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02WCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO WASTEWATER TREATMENT OR WASTE MANAGEMENT
    • Y02W30/00Technologies for solid waste management
    • Y02W30/10Waste collection, transportation, transfer or storage, e.g. segregated refuse collecting, electric or hybrid propulsion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the field of garbage recognition algorithms, in particular to a garbage recognition precision improving method of an automatic bag-breaking classification box based on an AI algorithm, which comprises the following steps: step 1: making a data set; step 2: calling a garbage sample image in a training set or a verification set, and extracting multi-scale features by using a feature extraction network; and step 3: sending the multi-scale features extracted by the ResNet + FPN fusion in the step2 into an RPN network, and selecting frames of screened garbage target areas; and 4, step 4: selecting frames from the garbage target area screened in the step 3, reserving high-quality and high-precision garbage target candidate frames, simultaneously rejecting non-garbage candidate frames, and binding the characteristics of the garbage target candidate frames with garbage components; and 5: and evaluating the garbage component segmentation result and outputting the garbage component segmentation result. By adopting the steps, the garbage component recognition training method can realize the garbage classification recognition training of the system, improve the model precision and further improve the recognition rate of garbage components.

Description

AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box
Technical Field
The invention relates to the field of garbage recognition algorithms, in particular to a garbage recognition precision improving method of an automatic bag breaking classification box based on an AI algorithm.
Background
With the improvement of living standard and the increase of various consumption of modern people, the urban garbage is generated more and more, the problem of environmental pollution is more and more serious, and the recycling of garbage also becomes an important component of a sustainable development strategy. Because the garbage is often mixed with various types of objects, before the garbage is recycled, the garbage is firstly classified so as to be respectively recycled and treated according to different types of garbage, and the garbage is classified and thrown in at the source. In the current garbage classification processing, a garbage image training set is obtained, and a feature vector of the garbage image training set is obtained through a convolutional neural network; and inputting the feature vector into a single nonlinear classifier, performing supervised learning on the single nonlinear classifier according to the garbage label information to obtain a classification model, and classifying the target garbage picture through the classification model to obtain a garbage classification result. But the network depth constructed by the method is low, so that the capability of fitting the features of the garbage pictures is weak and the classification accuracy is low, and therefore, the accuracy of the garbage classification model on garbage classification is low.
Disclosure of Invention
In order to solve the problems, the invention aims to provide a garbage identification precision improving method of an automatic bag-breaking classification box based on an AI algorithm.
In order to achieve the purpose, the invention adopts the following technical scheme:
the AI algorithm-based garbage recognition precision improvement method for the automatic bag breaking classification box is characterized by comprising the following steps of: the method comprises the following steps:
step 1: making a data set; manually collecting a rubbish sample image, and labeling a rubbish component label to which the rubbish sample image belongs; dividing the data set into a training set, a verification set and a test set, and carrying out normalization and pretreatment on the data set;
step 2: calling a garbage sample image in a training set or a verification set, and extracting multi-scale features by using a feature extraction network; the method comprises the following steps that a feature extraction network selects Deformable Convolution constraint to build a deep residual error network ResNet, and is combined with an FPN network;
and step 3: sending the multi-scale features extracted by ResNet + FPN fusion in the step2 into an RPN network, wherein the RPN network can automatically generate a plurality of anchors with different sizes, obtaining corresponding candidate frames on an original image according to the anchors, counting the overlapping area of the candidate frames and the training set label area, and screening target area frame selection according to the size of the overlapping area; obtaining a loss value of the screened target area candidate frames through a secondary classification and regression equation, judging whether the objects in the target frames are garbage or not according to the loss value, and simultaneously rejecting non-garbage candidate frames;
and 4, step 4: selecting frames of the garbage target area screened in the step 3, obtaining a loss value through a multi-classification and regression equation, reserving a high-quality and high-precision garbage target candidate frame according to the loss value, simultaneously rejecting a non-garbage candidate frame, and binding the characteristics of the garbage target candidate frame with garbage components;
and 5: generating segmentation masks of the garbage images, outputting the segmentation masks in a log mode, carrying out binarization by using a threshold value, generating segmentation masks of a background and a foreground, and evaluating a garbage component segmentation result by using a similarity coefficient; and outputting the garbage component segmentation result.
By adopting the technical scheme, the invention relates to a garbage identification precision improving method of an automatic bag breaking classification box based on an AI algorithm.
Preferably, in step1, as the garbage images are of a great variety, and the acquired garbage samples cannot cover various garbage, the data set is subjected to data enhancement operations of turning, rotating, translating and changing brightness and color, so that the data set is increased; for different shapes of garbage samples, a higher-order Mixup, Cutout and Mosaic data enhancement method is adopted, so that the number of data sets is increased, the complexity of the data sets is increased, the deeper understanding of the data sets by a network model is facilitated, and the training and precision improvement of the network model are facilitated.
The step of the above Mixup: assume that batch 1 is a batch sample and batch 1 is the tag corresponding to the batch sample; the batch 2 is another batch sample, the batch 2 is a label corresponding to the batch sample, λ is a mixing coefficient calculated from a beta distribution with parameters α, β, and we can obtain the mixup principle formula as follows:
Figure DEST_PATH_IMAGE001
wherein Beta refers to a Beta distribution, and the mixed _ batch sample is a label corresponding to the mixed batch sample.
Preferably, in step 2; aiming at the problems of deformation, shielding, different shapes and the like of garbage articles, the common Convolution constraint can not be well adapted to garbage under the condition, and the more efficient Deformable Convolution Deformable constraint is selected to build a depth residual error network ResNet, so that the network can be more adapted to garbage samples of different types and shapes.
The Deformable Convolution constraint step comprises:
on the convolution or ROI sampling layer, a displacement variable is added, the variable is learned according to the data condition, after the displacement is carried out, the variable is equivalent to the telescopic change of each square of a convolution kernel, so that the scope of a receptive field is changed, and the receptive field is a polygon.
Preferably, in the step 3, the multiscale features extracted by the ResNet + FPN fusion in the step2 are sent to an RPN network for calculation, and a set number of candidate regions with the highest score values are input into an algorithm to realize candidate frame position screening by using classification and frame regression operations, so as to obtain a target region candidate frame.
Preferably, in the step 3, a method of clustering sizes of junk target objects by k-means is adopted, anchors with different sizes are counted, and corresponding candidate frames are obtained on the original image according to the anchors; and carrying out classification training on the candidate frames by using smooth L1 loss, and carrying out regression training on the candidate frames by matching with a loss function combining a recurrence loss step and a ciou loss step.
Preferably, the k-means clustering step:
<1> first determine a k value, i.e. we want to cluster the data set to get k sets;
<2> randomly selecting k data points from the data set as centroids;
<3> for each point in the data set, calculating the distance (such as Euclidean distance) between the point and each centroid, and dividing the point to which the centroid belongs when the point is close to which centroid;
<4> after all data are grouped together, k groups are formed in total; then re-computing the centroid of each set;
<5> if the distance between the newly calculated centroid and the original centroid is smaller than a certain set threshold (indicating that the position of the newly calculated centroid does not change much and tends to be stable or convergent), we can consider that the clustering has reached the expected result and terminate the algorithm;
<6> if the distance between the new centroid and the original centroid is greatly changed, the steps <3> - <5> need to be iterated;
given a training sample is
Figure DEST_PATH_IMAGE003
Each one of which is
Figure DEST_PATH_IMAGE005
I.e. each sample element is an n-dimensional vector; for ease of understanding, a two-dimensional vector is used in the following schematic;
step 1: randomly selecting k clustering centroid points as
Figure 220074DEST_PATH_IMAGE006
step 2: the following procedure is repeated, calculating for each sample i the class to which it should belong
Figure DEST_PATH_IMAGE007
For each class
Figure DEST_PATH_IMAGE009
Recalculating the centroid of the class
Figure 340477DEST_PATH_IMAGE010
Where K is the given number of clusters,
Figure 854635DEST_PATH_IMAGE012
representing the class of sample i that is closest in distance to the k classes,
Figure 984265DEST_PATH_IMAGE012
is one of 1 to k; center of mass
Figure 533058DEST_PATH_IMAGE009
Representing our guess of the sample center points belonging to the same class, the explanation by the star group model is to gather all the stars into k star groups, firstly randomly select the points (or k stars) in the k universes as the centroids of the k star groups, then calculate the distance from each star to each of the k centroids in the first step, and then select the star group with the closest distance as the centroid of the k universes
Figure 89941DEST_PATH_IMAGE012
Thus, each star has a cluster to which the star belongs through the first step; second step for each constellation, recalculate its centroid
Figure 91395DEST_PATH_IMAGE009
(average all star coordinates inside); the first step and the second step are iterated until the centroid is unchanged or changes very little.
Preferably, the distribution loss step:
the prediction frame P is required to be close to (attract) its own real target T, while P is also required to be far away from (repel) other real frames (e.g. B) beside T;
<1> Replusion loss:
Figure DEST_PATH_IMAGE013
it can be seen that the loss function contains 3 modules, which are explained below;
<2>LAttr
the role of this module is to make the prediction box as close as possible to its target box;
the LAttr adopts regression loss in the general target detection, and can adopt Euclidean distance, SmoothL1 distance and IoU distance; but for comparability with other algorithms, the SmoothL1 distance is used here;
Figure 248883DEST_PATH_IMAGE014
p ∈ P + (all positive samples): the positive sample is a set of detection frames P divided according to a set IoU threshold; g PA t t r
G PA t r matching a real target frame (Ground route) with maximum IoU value for each detection frame P
BP: obtaining a prediction frame after regression shift (shift) is carried out on the detection frame P;
the formula for smoothL1 is as follows:
Figure DEST_PATH_IMAGE015
Figure 386603DEST_PATH_IMAGE016
as can be seen from the formula, smoothL1 calculation is respectively carried out on the coordinates of the upper left corners of BP and G PA t r and the width and height (x, y, w, h), and then the sum is accumulated;
the optimization aims to shorten the distance between the prediction frame and the target frame, so that the aim of enabling the prediction frame to be as close as possible to the target frame is fulfilled;
<3> LRepGT
LRepGT is such that the prediction box P and the surrounding target box G are as far apart as possible;
Figure DEST_PATH_IMAGE017
Figure 114388DEST_PATH_IMAGE018
here, the surrounding target box is IoU largest target box except for the target box already matched, and is denoted by G prime;
Figure DEST_PATH_IMAGE019
that is, G PRep is the largest group _ truth-B with IoU of the prediction frame P among the remaining group _ truth except the group _ truth-a matched with the prediction frame P.
Preferably, the Ciou step:
CIOU calculation formula
Figure 337559DEST_PATH_IMAGE020
Wherein
Figure DEST_PATH_IMAGE021
The Euclidean distances of the central points of the prediction frame and the real frame are respectively represented, namely d in the graph; c represents the diagonal distance of the minimum closure area which can contain the prediction box and the real box at the same time;
wherein
Figure 277833DEST_PATH_IMAGE022
LOSS calculation on CIOU regression:
Figure DEST_PATH_IMAGE023
preferably, in the step 4, according to the loss value obtained by training, obtaining a Class Score with the highest Score of each target recommendation area and coordinates of the recommendation area, deleting the recommendation area with the highest Score as a background, eliminating the recommendation area with the highest Score not reaching a threshold, performing non-maximum value suppression NMS on candidate frames of the same category, eliminating-1 placeholder from frame index after NMS, obtaining top n, and finally returning information of each frame (y1, x1, y2, x2, Class _ ID, Score); and binding the information with the garbage component.
Preferably, the loss value obtained by training is obtained by Focal loss calculation; the phenomenon that the category number of the selected garbage target area and the non-garbage candidate frame is seriously unbalanced exists, and the problem of category unbalance can be effectively solved through the following steps of local:
the mathematical definition of Focal loss is as follows:
Figure 270060DEST_PATH_IMAGE024
when γ =0, this expression degenerates to Cross entry Loss,
Figure DEST_PATH_IMAGE025
"pt" is defined as follows, in its true sense:
Figure 168745DEST_PATH_IMAGE026
combining the two formulas, wherein Cross Engine Loss is changed into the formula;
Figure DEST_PATH_IMAGE027
adding these weights does help to handle class imbalances;
the Radam optimizer comprises the following steps:
the core idea is to use exponential moving average to estimate the first moment (momentum) and the second moment (adaptive learning rate) of each component of the gradient, and use the second moment to remove the normalize first moment to obtain the update amount of each step:
Figure 879212DEST_PATH_IMAGE028
wherein mt is a first moment (momentum), vt is a second moment (adaptive learning rate), η is a learning rate, ct is a bias correction term (bias correction), the maximum scale for preventing zero-division errors and controlling updating amount, Δ θ is a parameter updating amount, β 1 and β 2 are hyperparameters of exponential moving average, and the smaller the value is, the more the local average is inclined;
RAdam can have very large variance in the initial training stage, plays a role in correcting the updating direction by vt, solves the problem that the variance can be very large, and solves the problem of convergence to a local optimal solution by using a preheating (warmup) method.
Based on the whole scheme, the innovation points are as follows:
1. sample data enhancement:
the data enhancement operation is carried out on the garbage data samples, and the basic and high-order data enhancement contents are beneficial to the expansion and diversification of the garbage samples, so that the usability of the samples is enhanced.
2. Improvements to the network;
the Deformable Convolution Deformable constraint builds a deep residual error network ResNet, so that the network can be more suitable for garbage samples of different types and shapes.
3. The optimizer is improved:
and a Radam optimizer is used to help the network to better converge and improve the model precision.
4. The loss function is improved:
and (3) processing the hard sample by using the OHEM, combining the dice loss with the focal loss, and outputting the loss result of model training.
5. Using Multiscale Structure:
different models are trained by using different image sizes, small-size images are used, and large-size images are used for finetune, so that the network has higher robustness to the images with different sizes.
6. Improved network autonomous training mode
And a method for labeling and training the samples by using a pseudo-label idea enables the network to be trained autonomously.
7. The improved network autonomously conducts a reasonable threshold search.
Drawings
Fig. 1.1 is a normal convolution.
Fig. 1.2 is a deformable convolution.
Fig. 1.3 is a special case of a deformable convolution.
Fig. 1.4 is another specific example of a deformable convolution.
Detailed Description
The following describes in detail embodiments of the present invention, specifically as follows:
the embodiment relates to a garbage recognition precision improving method of an automatic bag breaking classification box based on an AI algorithm, which comprises the following steps:
step 1: making a data set; manually collecting a garbage sample image, and labeling a garbage component label to which the garbage sample image belongs; dividing the data set into a training set, a verification set and a test set, and carrying out normalization and pretreatment on the data set;
step 2: calling a garbage sample image in a training set or a verification set, and extracting multi-scale features by using a feature extraction network; the method comprises the following steps that a feature extraction network selects Deformable Convolution constraint to build a deep residual error network ResNet, and is combined with an FPN network;
and 3, step 3: the multi-scale features extracted by the ResNet + FPN fusion in the step2 are sent into an RPN network, the RPN network can automatically generate a plurality of anchors with different sizes, corresponding candidate frames are obtained on an original image according to the anchors, the overlapping area of the candidate frames and the training set label area is counted, and the target area frame selection is screened according to the size of the overlapping area; obtaining a loss value through a second classification and a regression equation for the screened target area candidate frames, judging whether the objects in the target frames are garbage or not according to the loss value, and simultaneously rejecting non-garbage candidate frames;
and 4, step 4: selecting frames from the garbage target area screened in the step 3, obtaining a loss value through a multi-classification and regression equation, reserving high-quality and high-precision garbage target candidate frames according to the loss value, simultaneously eliminating non-garbage candidate frames, and binding the characteristics of the garbage target candidate frames with garbage components;
and 5: generating segmentation masks of the garbage images, outputting the segmentation masks in a log mode, carrying out binarization by using a threshold value, generating segmentation masks of a background and a foreground, and evaluating a garbage component segmentation result by using a similarity coefficient; and outputting the garbage component segmentation result.
By adopting the technical scheme, the invention relates to a garbage identification precision improving method of an automatic bag breaking classification box based on an AI algorithm.
In a specific embodiment, in step1, as the garbage images are of a great variety, and the acquired garbage samples cannot cover various garbage, the data set is subjected to data enhancement operations of turning, rotating, translating and changing brightness and color, so that the data set is increased; for different shapes of garbage samples, a higher-order Mixup, Cutout and Mosaic data enhancement method is adopted, so that the number of data sets is increased, the complexity of the data sets is increased, the deeper understanding of the data sets by a network model is facilitated, and the training and precision improvement of the network model are facilitated.
The step of the above Mixup: assume that fetch 1 is a fetch sample and fetch 1 is the tag corresponding to the fetch sample; the batch 2 is another batch sample, the batch 2 is a label corresponding to the batch sample, λ is a mixing coefficient calculated from a beta distribution with parameters α, β, and we can obtain the mixup principle formula as follows:
Figure DEST_PATH_IMAGE029
wherein Beta refers to a Beta distribution, and the mixed _ batch sample is a label corresponding to the mixed batch sample.
In a further preferred embodiment, in step 2; aiming at the problems of deformation, shielding, shape variation and the like of garbage articles, the common Convolution constraint cannot be well adapted to garbage under the condition, and a more efficient Deformable Convolution constraint is selected to build a depth residual error network ResNet, so that the network can be more adapted to garbage samples of different types and shapes.
The Deformable Convolution constraint step comprises:
on a convolution or ROI sampling layer, a displacement variable is added, the variable is learned according to the data condition, after the displacement, each square of a convolution kernel can be changed in a telescopic mode, the scope of a receptive field is changed, and the receptive field is a polygon.
Shown in fig. 1.1-1.4 are depictions of the formatted and generic contributions in a two-dimensional plane. Fig. 1.1 shows a general convolution with a convolution kernel size of 3 x 3 and a very regular arrangement of the samples, which is a square. Fig. 1.2 is a deformable convolution, adding an offset to each sample point (this offset is learned by an extra convolution layer), the arrangement becomes irregular. FIGS. 1.3 and 1.4 are two specific examples of deformable convolutions, for FIG. 1.3 plus offset, to achieve the effect of scaling; for fig. 1.4 plus offset, the effect of the rotation transformation is achieved.
In a further preferred scheme, the multiscale features extracted by using ResNet + FPN fusion in step2 in step 3 are sent to an RPN network for calculation, and a set number of candidate regions with the highest score values are input into an algorithm to realize candidate frame position screening by using classification and frame regression operation, so as to obtain a target region candidate frame.
Specifically, in the step 3, a method of clustering sizes of junk target objects by k-means is adopted, anchors with different sizes are counted, and corresponding candidate frames are obtained on the original image according to the anchors; and carrying out classification training on the candidate frames by using smooth L1 loss, and carrying out regression training on the candidate frames by matching with a loss function combining a recurrence loss step and a ciou loss step.
Wherein the k-means clustering step comprises the following steps:
<1> first determine a k value, i.e. we want to cluster the data set to get k sets;
<2> randomly selecting k data points from the data set as centroids;
<3> for each point in the data set, calculating the distance (such as Euclidean distance) between the point and each centroid, and dividing the point to which the centroid belongs when the point is close to which centroid;
<4> after all data are grouped into sets, k sets exist in total; then re-computing the centroid of each set;
<5> if the distance between the newly calculated centroid and the original centroid is smaller than a certain set threshold (indicating that the position of the newly calculated centroid does not change much and tends to be stable or convergent), we can consider that the clustering has reached the expected result and the algorithm is terminated;
<6> if the distance between the new centroid and the original centroid is greatly changed, the steps <3> - <5> need to be iterated;
given a training sample is
Figure 121713DEST_PATH_IMAGE003
Each one of which is
Figure 499604DEST_PATH_IMAGE005
I.e. each sample element is an n-dimensional vector; for ease of understanding, a two-dimensional vector is used in the following schematic;
step 1: randomly selecting k clustering centroid points as
Figure 569192DEST_PATH_IMAGE006
step 2: the following procedure is repeated, calculating for each sample i the class to which it should belong
Figure 32534DEST_PATH_IMAGE007
For each class
Figure 845769DEST_PATH_IMAGE009
Recalculating the centroid of the class
Figure 78167DEST_PATH_IMAGE010
Where K is the given number of clusters,
Figure 584235DEST_PATH_IMAGE012
representing the class of sample i that is closest in distance to the k classes,
Figure 269294DEST_PATH_IMAGE012
is one of 1 to k(ii) a Center of mass
Figure 886220DEST_PATH_IMAGE009
Representing our guess of the sample center points belonging to the same class, the explanation by the star group model is to gather all the stars into k star groups, firstly randomly select the points (or k stars) in the k universes as the centroids of the k star groups, then calculate the distance from each star to each of the k centroids in the first step, and then select the star group with the closest distance as the centroid of the k universes
Figure 973125DEST_PATH_IMAGE012
Thus, each star has a cluster to which the star belongs through the first step; second step for each constellation, recalculate its centroid
Figure 650094DEST_PATH_IMAGE009
(average all the star coordinates inside); the first and second steps are iterated until the centroid is unchanged or varies little.
Wherein the process of the Replusion loss comprises the following steps:
the prediction frame P is required to be close to (attract) its own real target T, while P is also required to be far away from (repel) other real frames (e.g. B) beside T;
<1> Replusion loss:
Figure 822450DEST_PATH_IMAGE030
it can be seen that the loss function contains 3 modules, which are explained below;
<2>LAttr
the role of this module is to make the prediction box as close as possible to its target box;
the LAttr adopts regression loss in the general target detection, and can adopt Euclidean distance, SmoothL1 distance and IoU distance; but for comparability with other algorithms, the SmoothL1 distance is used here;
Figure 977487DEST_PATH_IMAGE014
p ∈ P + (all positive samples): the positive samples are the set of detection frames P divided according to the set IoU threshold; g PA t t r
G PA t r matching a real target frame (Ground route) with maximum IoU value for each detection frame P
BP: a prediction frame is obtained after regression shift (shift) is carried out on the detection frame P;
the formula for smoothL1 is as follows:
Figure 184478DEST_PATH_IMAGE015
Figure 32348DEST_PATH_IMAGE016
as can be seen from the formula, smoothL1 is respectively calculated on the coordinates of the upper left corners of BP and G PA t r and the width and height (x, y, w, h), and then the sum is accumulated;
the optimization aims to shorten the distance between the prediction frame and the target frame, so that the prediction frame is as close as possible to the target frame;
<3> LRepGT
LRepGT is such that the prediction box P and the surrounding target box G are as far apart as possible;
Figure 193464DEST_PATH_IMAGE017
Figure 152193DEST_PATH_IMAGE018
here, the surrounding target box is IoU largest target box except for the target box already matched, and is denoted by G prime;
Figure 948111DEST_PATH_IMAGE019
that is, G PRep is the group _ route-B which is the largest with IoU of the prediction block P among the remaining group _ routes except the group _ route-a which matches the prediction block P.
Wherein Ciou steps are as follows:
CIOU calculation formula
Figure 966882DEST_PATH_IMAGE020
Wherein
Figure 113830DEST_PATH_IMAGE021
The Euclidean distances of the central points of the prediction frame and the real frame are respectively represented, namely d in the graph; c represents the diagonal distance of the minimum closure area which can contain the prediction box and the real box at the same time;
wherein
Figure DEST_PATH_IMAGE031
LOSS calculation in CIOU regression:
Figure 610670DEST_PATH_IMAGE023
in a further preferred scheme, in the step 4, according to the loss value obtained by training, obtaining a Class Score with the highest Score of each target recommendation area and coordinates of the recommendation area, deleting the recommendation area with the highest Score as a background, eliminating the recommendation area with the highest Score not reaching a threshold value, performing non-maximum value suppression NMS on candidate frames of the same category, eliminating-1 placeholder from frame indexes after NMS, obtaining top n, and finally returning information of each frame (y1, x1, y2, x2, Class _ ID, Score); and binding the information with the garbage component. Specifically, the loss value obtained by training is obtained by Focal loss calculation; the phenomenon that the category number of the selected garbage target area and the non-garbage candidate frame is seriously unbalanced exists, and the problem of category unbalance can be effectively solved through the following steps of local:
the mathematical definition of Focal loss is as follows:
Figure 526674DEST_PATH_IMAGE024
when gamma =0, this expression degenerates to Cross control Loss,
Figure 450767DEST_PATH_IMAGE025
"pt" is defined as follows, in its true sense:
Figure 85011DEST_PATH_IMAGE032
combining the two formulas, wherein Cross Engine Loss is changed into the formula;
Figure 385542DEST_PATH_IMAGE027
adding these weights does help to handle class imbalances;
the Radam optimizer comprises the following steps:
the core idea is to estimate the first moment (momentum) and the second moment (adaptive learning rate) of each component of the gradient by exponential moving average, and remove the first moment of the normalization by using the second moment to obtain the update quantity of each step:
Figure 421632DEST_PATH_IMAGE028
wherein mt is a first moment (momentum), vt is a second moment (adaptive learning rate), η is a learning rate, ct is a bias correction term (bias correction), the maximum scale for preventing zero-division errors and controlling updating amount, Δ θ is a parameter updating amount, β 1 and β 2 are hyperparameters of exponential moving average, and the smaller the value is, the more the local average is inclined;
RAdam can have very large variance in the initial training stage, plays a role in correcting the updating direction by vt, solves the problem that the variance can be very large, and solves the problem of convergence to a local optimal solution by using a preheating (warmup) method.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that those skilled in the art may make variations, modifications, substitutions and alterations within the scope of the present invention without departing from the spirit and scope of the present invention.

Claims (8)

1. The method for improving the garbage recognition precision of the automatic bag breaking classification box based on the AI algorithm is characterized by comprising the following steps of: the method comprises the following steps:
step 1: making a data set; manually collecting a rubbish sample image, and labeling a rubbish component label to which the rubbish sample image belongs; dividing the data set into a training set, a verification set and a test set, and carrying out normalization and pretreatment on the data set;
step 2: calling a garbage sample image in a training set or a verification set, and extracting multi-scale features by using a feature extraction network; the method comprises the following steps that a feature extraction network selects Deformable Convolution constraint to build a deep residual error network ResNet, and is combined with an FPN network;
and step 3: sending the multi-scale features extracted by ResNet + FPN fusion in the step2 into an RPN network, wherein the RPN network can automatically generate a plurality of anchors with different sizes, obtaining corresponding candidate frames on an original image according to the anchors, counting the overlapping area of the candidate frames and the training set label area, and screening target area frame selection according to the size of the overlapping area; obtaining a loss value of the screened target area candidate frames through a secondary classification and regression equation, judging whether the objects in the target frames are garbage or not according to the loss value, and simultaneously rejecting non-garbage candidate frames; the method comprises the following steps: in the step 3, the multi-scale features extracted by ResNet and FPN fusion in the step2 are sent into an RPN for calculation, and a set number of candidate regions with the highest score values are input into an algorithm to realize candidate frame position screening by using classification and frame regression operation, so that a target region candidate frame is obtained; in the step 3, counting anchors with different sizes by adopting a method of clustering sizes of garbage target objects by k-means, and obtaining corresponding candidate frames on the original image according to the anchors; carrying out classification training on the candidate frames by using smooth L1 loss, and carrying out regression training on the candidate frames by matching with a loss function combining a recurrence loss step and a ciou loss step;
and 4, step 4: selecting frames of the garbage target area screened in the step 3, obtaining a loss value through a multi-classification and regression equation, reserving a high-quality and high-precision garbage target candidate frame according to the loss value, simultaneously rejecting a non-garbage candidate frame, and binding the characteristics of the garbage target candidate frame with garbage components;
and 5: generating segmentation masks of the garbage images, outputting the segmentation masks in a log mode, carrying out binarization by using a threshold value, generating segmentation masks of a background and a foreground, and evaluating a garbage component segmentation result by using a similarity coefficient; and outputting the garbage component segmentation result.
2. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein: in the step1, data enhancement operations of turning, rotating, translating and changing brightness and color are carried out on the data sets, and the number of the data sets is increased; for different shapes of garbage samples, higher-order Mixup, Cutout and Mosaic data enhancement methods are adopted, and the complexity of a data set is increased.
3. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein: the Deformable Convolution constraint step in the step2 is as follows: on a convolution or ROI sampling layer, a displacement variable is added, and the displacement variable is learned according to the condition of data; after the offset, each square of the convolution kernel can be changed in a telescopic mode, so that the range of the receptive field is changed, and the receptive field is a polygon.
4. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein:
k-means clustering step:
<1> first determine a k value;
<2> randomly selecting k data points from the data set as a centroid;
<3> calculating the distance between each point in the data set and each centroid, and dividing the point to which the centroid belongs when the point is close to which centroid;
<4> after all data are grouped together, k groups are formed in total; then re-computing the centroid of each set;
<5> if the distance between the newly calculated centroid and the original centroid is less than a certain set threshold, the algorithm terminates;
<6> if the distance between the new centroid and the original centroid is greatly changed, the steps <3> to <5> need to be iterated;
given a training sample of { x } (1) ,...,x (m) Each X (i) ∈R n I.e. each sample element is an n-dimensional vector;
step 1: randomly selecting k clustering centroid points as
Figure FDA0003663774060000021
step 2: the following procedure is repeated, calculating for each sample i the class to which it should belong
Figure FDA0003663774060000022
For each class mu j Recalculating the centroid of the class
Figure FDA0003663774060000023
Where K is the given number of clusters, c (i) Representative example i is closest to k classesThat class, c (i) Is one of 1 to k; centroid mu j Representing a guess of the center point of the samples belonging to the same class.
5. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein:
a reporting loss step:
the prediction frame P is required to be close to the real target T of the prediction frame P, and meanwhile, the prediction frame P is required to be far away from other real frames beside the T;
<1>Repulsion loss:
L=L Attr +α*L Rep GT+β*L RepBox
the loss function includes 3 modules, which are explained below;
<2>L Attr
the role of this module is to bring the prediction box as close as possible to its target box;
L Attr the regression loss in the general target detection is adopted, and the SmoothL1 distance is adopted;
Figure FDA0003663774060000024
p belongs to P +: the positive sample is a set of detection frames P divided according to a set IoU threshold;
Figure FDA0003663774060000025
matching a real target box with the maximum IoU value for each detection box P;
B P : obtaining a prediction frame after regression deviation is carried out on the detection frame P;
the formula for smoothL1 is as follows:
Figure FDA0003663774060000031
Figure FDA0003663774060000032
wherein v is i Is an actual standard value; t is t i A predicted value for the convolutional neural network;
as can be seen from the formula, B P And
Figure FDA0003663774060000033
respectively calculating smoothL1 according to the coordinates of the upper left corner and the width and the height (x, y, w, h), and then accumulating the sum;
the optimization aims to shorten the distance between the prediction frame and the target frame, so that the prediction frame is as close as possible to the target frame;
<3>L RepGT
L RepGT is to make the prediction frame P and the surrounding target frame G as far apart as possible;
Figure FDA0003663774060000034
Figure FDA0003663774060000035
wherein σ ∈ [0,1] is a smoothing parameter used to adjust the sensitivity of replication loss to outliers;
here, the surrounding target frame is IoU largest target frame except for the target frame already matched
Figure FDA0003663774060000036
Represents;
Figure FDA0003663774060000037
wherein G is an artificial marking frame,
Figure FDA0003663774060000038
Except the group _ route-a that matches the prediction block P, the group _ route-B that is IoU largest among the remaining group _ routes is the group _ route-B that matches the prediction block P.
6. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein:
CIOU step:
CIOU calculation formula
Figure FDA0003663774060000039
Where ρ is 2 (b,b gt ) C represents the diagonal distance of the minimum closure area which can simultaneously contain the prediction frame and the real frame;
wherein
Figure FDA0003663774060000041
Figure FDA0003663774060000042
Wherein w gt And h gt The width and the height of a manual marking frame are indicated, and w and h are the width and the height of a prediction frame;
LOSS calculation in CIOU regression:
Figure FDA0003663774060000043
7. the AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1 or 2, wherein: in the step 4, according to the loss value obtained by training, obtaining the Class Score with the highest Score and the coordinates of the recommendation area of each target recommendation area, deleting the recommendation area with the highest Score as the background, eliminating the recommendation area with the highest Score not reaching the threshold value, performing non-maximum value suppression NMS on the candidate frames of the same category, eliminating a-1 placeholder from the frame index after NMS, obtaining the front n, and finally returning the information of each frame (y1, x1, y2, x2, Class _ ID, Score); and binding the information with the garbage component.
8. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 7, characterized in that: the loss value obtained by training is obtained by Focal loss calculation; the phenomenon that the number of categories of selected frames and non-garbage candidate frames in the screened garbage target area is seriously unbalanced can be solved effectively through the local loss, and the local loss comprises the following steps:
the mathematical definition of Focal loss is as follows:
FL(p t )=-α t (1-p t ) γ log(p t ).
wherein set alpha t To control the shared weight of positive and negative samples on total loss; a, taking a smaller value to reduce the weight of the negative sample;
when γ is 0, this expression degenerates to Cross control Loss,
Figure FDA0003663774060000044
definition of "p t "as follows, in its true sense:
Figure FDA0003663774060000051
combining the two formulas, wherein Cross Engine Loss is changed into the formula;
CE(p t )=-log(p t ).
adding these weights does help to handle class imbalances;
the Radam optimizer steps are as follows:
the core idea is to estimate the first moment and the second moment of each component of the gradient by exponential moving average, and remove the first moment of the normalization by the second moment to obtain the update quantity of each step:
m t =β 1 m t-1 +(1-β 1 )g t
Figure FDA0003663774060000052
Figure FDA0003663774060000053
Figure FDA0003663774060000054
wherein, g t Is the gradient of the objective function, m t Is a first moment, v t Is the second moment, eta is the learning rate, c t Is a deviation correction term, prevents divide-by-zero errors and controls the maximum scale of the update quantity, and is a parameter update quantity, where ε is a small number, and β is the value of the parameter update quantity to ensure that the denominator is not 0 1 And beta 2 Refers to a constant decay rate, smaller indicates a tendency to local average, β 1 t And beta 2 t (ii) a Refers to the decay rate that varies over time;
radam will be very different in magnitude at the beginning of training, with upsilon t The method has the functions of correcting the updating direction, solves the problem that the variance is very large, and solves the problem of converging to a local optimal solution by using a preheating method.
CN202110892805.1A 2021-08-04 2021-08-04 AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box Active CN113743470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110892805.1A CN113743470B (en) 2021-08-04 2021-08-04 AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110892805.1A CN113743470B (en) 2021-08-04 2021-08-04 AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box

Publications (2)

Publication Number Publication Date
CN113743470A CN113743470A (en) 2021-12-03
CN113743470B true CN113743470B (en) 2022-08-23

Family

ID=78730108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110892805.1A Active CN113743470B (en) 2021-08-04 2021-08-04 AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box

Country Status (1)

Country Link
CN (1) CN113743470B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782759B (en) * 2022-06-22 2022-09-13 鲁东大学 Method for detecting densely-occluded fish based on YOLOv5 network
CN115393892B (en) * 2022-07-20 2023-08-04 东北电力大学 Congestion scene pedestrian detection method based on improved double-candidate-frame cross replacement strategy and loss function
CN114937179B (en) * 2022-07-27 2022-12-13 深圳市海清数字技术有限公司 Junk image classification method and device, electronic equipment and storage medium
CN115471871A (en) * 2022-09-22 2022-12-13 四川农业大学 Sheldrake gender classification and identification method based on target detection and classification network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109201514B (en) * 2017-06-30 2019-11-08 京东方科技集团股份有限公司 Waste sorting recycle method, garbage classification device and classified-refuse recovery system
CN108830205B (en) * 2018-06-04 2019-06-14 江南大学 Based on the multiple dimensioned perception pedestrian detection method for improving full convolutional network
CN110852420B (en) * 2019-11-11 2021-04-13 北京智能工场科技有限公司 Garbage classification method based on artificial intelligence
CN111079639B (en) * 2019-12-13 2023-09-19 中国平安财产保险股份有限公司 Method, device, equipment and storage medium for constructing garbage image classification model
CN111259809B (en) * 2020-01-17 2021-08-17 五邑大学 Unmanned aerial vehicle coastline floating garbage inspection system based on DANet
CN112016462A (en) * 2020-08-28 2020-12-01 佛山市南海区广工大数控装备协同创新研究院 Recovery bottle classification method based on deep learning model

Also Published As

Publication number Publication date
CN113743470A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN113743470B (en) AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box
CN111191732B (en) Target detection method based on full-automatic learning
CN109614985B (en) Target detection method based on densely connected feature pyramid network
CN106960214B (en) Object recognition method based on image
CN111695482A (en) Pipeline defect identification method
CN110110802A (en) Airborne laser point cloud classification method based on high-order condition random field
CN105701502B (en) Automatic image annotation method based on Monte Carlo data equalization
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN112101430B (en) Anchor frame generation method for image target detection processing and lightweight target detection method
CN107633226B (en) Human body motion tracking feature processing method
Duque-Arias et al. On power Jaccard losses for semantic segmentation
JP6897749B2 (en) Learning methods, learning systems, and learning programs
CN107341447A (en) A kind of face verification mechanism based on depth convolutional neural networks and evidence k nearest neighbor
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN103559504A (en) Image target category identification method and device
CN111161244B (en) Industrial product surface defect detection method based on FCN + FC-WXGboost
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN109284779A (en) Object detecting method based on the full convolutional network of depth
CN108154158B (en) Building image segmentation method for augmented reality application
CN112802054A (en) Mixed Gaussian model foreground detection method fusing image segmentation
CN113221956B (en) Target identification method and device based on improved multi-scale depth model
CN104680193A (en) Online target classification method and system based on fast similarity network fusion algorithm
CN113705579A (en) Automatic image annotation method driven by visual saliency
CN111488911A (en) Image entity extraction method based on Mask R-CNN and GAN
Zeng et al. Steel sheet defect detection based on deep learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant