CN113743470B

CN113743470B - AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box

Info

Publication number: CN113743470B
Application number: CN202110892805.1A
Authority: CN
Inventors: 鲍承德; 黄正; 陈洁
Original assignee: Zhejiang Lianyun Environment Engineering Co ltd
Current assignee: Zhejiang Lianyun Environment Engineering Co ltd
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2022-08-23
Anticipated expiration: 2041-08-04
Also published as: CN113743470A

Abstract

The invention relates to the field of garbage recognition algorithms, in particular to a garbage recognition precision improving method of an automatic bag-breaking classification box based on an AI algorithm, which comprises the following steps: step 1: making a data set; step 2: calling a garbage sample image in a training set or a verification set, and extracting multi-scale features by using a feature extraction network; and step 3: sending the multi-scale features extracted by the ResNet + FPN fusion in the step2 into an RPN network, and selecting frames of screened garbage target areas; and 4, step 4: selecting frames from the garbage target area screened in the step 3, reserving high-quality and high-precision garbage target candidate frames, simultaneously rejecting non-garbage candidate frames, and binding the characteristics of the garbage target candidate frames with garbage components; and 5: and evaluating the garbage component segmentation result and outputting the garbage component segmentation result. By adopting the steps, the garbage component recognition training method can realize the garbage classification recognition training of the system, improve the model precision and further improve the recognition rate of garbage components.

Description

AI algorithm-based garbage recognition precision improvement method for automatic bag breaking classification box

Technical Field

The invention relates to the field of garbage recognition algorithms, in particular to a garbage recognition precision improving method of an automatic bag breaking classification box based on an AI algorithm.

Background

With the improvement of living standard and the increase of various consumption of modern people, the urban garbage is generated more and more, the problem of environmental pollution is more and more serious, and the recycling of garbage also becomes an important component of a sustainable development strategy. Because the garbage is often mixed with various types of objects, before the garbage is recycled, the garbage is firstly classified so as to be respectively recycled and treated according to different types of garbage, and the garbage is classified and thrown in at the source. In the current garbage classification processing, a garbage image training set is obtained, and a feature vector of the garbage image training set is obtained through a convolutional neural network; and inputting the feature vector into a single nonlinear classifier, performing supervised learning on the single nonlinear classifier according to the garbage label information to obtain a classification model, and classifying the target garbage picture through the classification model to obtain a garbage classification result. But the network depth constructed by the method is low, so that the capability of fitting the features of the garbage pictures is weak and the classification accuracy is low, and therefore, the accuracy of the garbage classification model on garbage classification is low.

Disclosure of Invention

In order to solve the problems, the invention aims to provide a garbage identification precision improving method of an automatic bag-breaking classification box based on an AI algorithm.

In order to achieve the purpose, the invention adopts the following technical scheme:

the AI algorithm-based garbage recognition precision improvement method for the automatic bag breaking classification box is characterized by comprising the following steps of: the method comprises the following steps:

step 1: making a data set; manually collecting a rubbish sample image, and labeling a rubbish component label to which the rubbish sample image belongs; dividing the data set into a training set, a verification set and a test set, and carrying out normalization and pretreatment on the data set;

step 2: calling a garbage sample image in a training set or a verification set, and extracting multi-scale features by using a feature extraction network; the method comprises the following steps that a feature extraction network selects Deformable Convolution constraint to build a deep residual error network ResNet, and is combined with an FPN network;

and step 3: sending the multi-scale features extracted by ResNet + FPN fusion in the step2 into an RPN network, wherein the RPN network can automatically generate a plurality of anchors with different sizes, obtaining corresponding candidate frames on an original image according to the anchors, counting the overlapping area of the candidate frames and the training set label area, and screening target area frame selection according to the size of the overlapping area; obtaining a loss value of the screened target area candidate frames through a secondary classification and regression equation, judging whether the objects in the target frames are garbage or not according to the loss value, and simultaneously rejecting non-garbage candidate frames;

and 4, step 4: selecting frames of the garbage target area screened in the step 3, obtaining a loss value through a multi-classification and regression equation, reserving a high-quality and high-precision garbage target candidate frame according to the loss value, simultaneously rejecting a non-garbage candidate frame, and binding the characteristics of the garbage target candidate frame with garbage components;

and 5: generating segmentation masks of the garbage images, outputting the segmentation masks in a log mode, carrying out binarization by using a threshold value, generating segmentation masks of a background and a foreground, and evaluating a garbage component segmentation result by using a similarity coefficient; and outputting the garbage component segmentation result.

By adopting the technical scheme, the invention relates to a garbage identification precision improving method of an automatic bag breaking classification box based on an AI algorithm.

Preferably, in step1, as the garbage images are of a great variety, and the acquired garbage samples cannot cover various garbage, the data set is subjected to data enhancement operations of turning, rotating, translating and changing brightness and color, so that the data set is increased; for different shapes of garbage samples, a higher-order Mixup, Cutout and Mosaic data enhancement method is adopted, so that the number of data sets is increased, the complexity of the data sets is increased, the deeper understanding of the data sets by a network model is facilitated, and the training and precision improvement of the network model are facilitated.

The step of the above Mixup: assume that batch 1 is a batch sample and batch 1 is the tag corresponding to the batch sample; the batch 2 is another batch sample, the batch 2 is a label corresponding to the batch sample, λ is a mixing coefficient calculated from a beta distribution with parameters α, β, and we can obtain the mixup principle formula as follows:

wherein Beta refers to a Beta distribution, and the mixed _ batch sample is a label corresponding to the mixed batch sample.

Preferably, in step 2; aiming at the problems of deformation, shielding, different shapes and the like of garbage articles, the common Convolution constraint can not be well adapted to garbage under the condition, and the more efficient Deformable Convolution Deformable constraint is selected to build a depth residual error network ResNet, so that the network can be more adapted to garbage samples of different types and shapes.

The Deformable Convolution constraint step comprises:

on the convolution or ROI sampling layer, a displacement variable is added, the variable is learned according to the data condition, after the displacement is carried out, the variable is equivalent to the telescopic change of each square of a convolution kernel, so that the scope of a receptive field is changed, and the receptive field is a polygon.

Preferably, in the step 3, the multiscale features extracted by the ResNet + FPN fusion in the step2 are sent to an RPN network for calculation, and a set number of candidate regions with the highest score values are input into an algorithm to realize candidate frame position screening by using classification and frame regression operations, so as to obtain a target region candidate frame.

Preferably, in the step 3, a method of clustering sizes of junk target objects by k-means is adopted, anchors with different sizes are counted, and corresponding candidate frames are obtained on the original image according to the anchors; and carrying out classification training on the candidate frames by using smooth L1 loss, and carrying out regression training on the candidate frames by matching with a loss function combining a recurrence loss step and a ciou loss step.

Preferably, the k-means clustering step:

<1> first determine a k value, i.e. we want to cluster the data set to get k sets;

<2> randomly selecting k data points from the data set as centroids;

<3> for each point in the data set, calculating the distance (such as Euclidean distance) between the point and each centroid, and dividing the point to which the centroid belongs when the point is close to which centroid;

<4> after all data are grouped together, k groups are formed in total; then re-computing the centroid of each set;

<5> if the distance between the newly calculated centroid and the original centroid is smaller than a certain set threshold (indicating that the position of the newly calculated centroid does not change much and tends to be stable or convergent), we can consider that the clustering has reached the expected result and terminate the algorithm;

<6> if the distance between the new centroid and the original centroid is greatly changed, the steps <3> - <5> need to be iterated;

given a training sample is

Each one of which is

I.e. each sample element is an n-dimensional vector; for ease of understanding, a two-dimensional vector is used in the following schematic;

step 1: randomly selecting k clustering centroid points as

step 2: the following procedure is repeated, calculating for each sample i the class to which it should belong

For each class

Recalculating the centroid of the class

Where K is the given number of clusters,

representing the class of sample i that is closest in distance to the k classes,

is one of 1 to k; center of mass

Representing our guess of the sample center points belonging to the same class, the explanation by the star group model is to gather all the stars into k star groups, firstly randomly select the points (or k stars) in the k universes as the centroids of the k star groups, then calculate the distance from each star to each of the k centroids in the first step, and then select the star group with the closest distance as the centroid of the k universes

Thus, each star has a cluster to which the star belongs through the first step; second step for each constellation, recalculate its centroid

(average all star coordinates inside); the first step and the second step are iterated until the centroid is unchanged or changes very little.

Preferably, the distribution loss step:

the prediction frame P is required to be close to (attract) its own real target T, while P is also required to be far away from (repel) other real frames (e.g. B) beside T;

<1> Replusion loss：

it can be seen that the loss function contains 3 modules, which are explained below;

<2>LAttr

the role of this module is to make the prediction box as close as possible to its target box;

the LAttr adopts regression loss in the general target detection, and can adopt Euclidean distance, SmoothL1 distance and IoU distance; but for comparability with other algorithms, the SmoothL1 distance is used here;

p ∈ P + (all positive samples): the positive sample is a set of detection frames P divided according to a set IoU threshold; g PA t t r

G PA t r matching a real target frame (Ground route) with maximum IoU value for each detection frame P

BP: obtaining a prediction frame after regression shift (shift) is carried out on the detection frame P;

the formula for smoothL1 is as follows:

as can be seen from the formula, smoothL1 calculation is respectively carried out on the coordinates of the upper left corners of BP and G PA t r and the width and height (x, y, w, h), and then the sum is accumulated;

the optimization aims to shorten the distance between the prediction frame and the target frame, so that the aim of enabling the prediction frame to be as close as possible to the target frame is fulfilled;

<3> LRepGT

LRepGT is such that the prediction box P and the surrounding target box G are as far apart as possible;

here, the surrounding target box is IoU largest target box except for the target box already matched, and is denoted by G prime;

that is, G PRep is the largest group _ truth-B with IoU of the prediction frame P among the remaining group _ truth except the group _ truth-a matched with the prediction frame P.

Preferably, the Ciou step:

CIOU calculation formula

Wherein

The Euclidean distances of the central points of the prediction frame and the real frame are respectively represented, namely d in the graph; c represents the diagonal distance of the minimum closure area which can contain the prediction box and the real box at the same time;

wherein

LOSS calculation on CIOU regression:

。

preferably, in the step 4, according to the loss value obtained by training, obtaining a Class Score with the highest Score of each target recommendation area and coordinates of the recommendation area, deleting the recommendation area with the highest Score as a background, eliminating the recommendation area with the highest Score not reaching a threshold, performing non-maximum value suppression NMS on candidate frames of the same category, eliminating-1 placeholder from frame index after NMS, obtaining top n, and finally returning information of each frame (y1, x1, y2, x2, Class _ ID, Score); and binding the information with the garbage component.

Preferably, the loss value obtained by training is obtained by Focal loss calculation; the phenomenon that the category number of the selected garbage target area and the non-garbage candidate frame is seriously unbalanced exists, and the problem of category unbalance can be effectively solved through the following steps of local:

the mathematical definition of Focal loss is as follows:

when γ =0, this expression degenerates to Cross entry Loss,

"pt" is defined as follows, in its true sense:

combining the two formulas, wherein Cross Engine Loss is changed into the formula;

adding these weights does help to handle class imbalances;

the Radam optimizer comprises the following steps:

the core idea is to use exponential moving average to estimate the first moment (momentum) and the second moment (adaptive learning rate) of each component of the gradient, and use the second moment to remove the normalize first moment to obtain the update amount of each step:

wherein mt is a first moment (momentum), vt is a second moment (adaptive learning rate), η is a learning rate, ct is a bias correction term (bias correction), the maximum scale for preventing zero-division errors and controlling updating amount, Δ θ is a parameter updating amount, β 1 and β 2 are hyperparameters of exponential moving average, and the smaller the value is, the more the local average is inclined;

RAdam can have very large variance in the initial training stage, plays a role in correcting the updating direction by vt, solves the problem that the variance can be very large, and solves the problem of convergence to a local optimal solution by using a preheating (warmup) method.

Based on the whole scheme, the innovation points are as follows:

1. sample data enhancement:

the data enhancement operation is carried out on the garbage data samples, and the basic and high-order data enhancement contents are beneficial to the expansion and diversification of the garbage samples, so that the usability of the samples is enhanced.

2. Improvements to the network;

the Deformable Convolution Deformable constraint builds a deep residual error network ResNet, so that the network can be more suitable for garbage samples of different types and shapes.

3. The optimizer is improved:

and a Radam optimizer is used to help the network to better converge and improve the model precision.

4. The loss function is improved:

and (3) processing the hard sample by using the OHEM, combining the dice loss with the focal loss, and outputting the loss result of model training.

5. Using Multiscale Structure:

different models are trained by using different image sizes, small-size images are used, and large-size images are used for finetune, so that the network has higher robustness to the images with different sizes.

6. Improved network autonomous training mode

And a method for labeling and training the samples by using a pseudo-label idea enables the network to be trained autonomously.

7. The improved network autonomously conducts a reasonable threshold search.

Drawings

Fig. 1.1 is a normal convolution.

Fig. 1.2 is a deformable convolution.

Fig. 1.3 is a special case of a deformable convolution.

Fig. 1.4 is another specific example of a deformable convolution.

Detailed Description

The following describes in detail embodiments of the present invention, specifically as follows:

the embodiment relates to a garbage recognition precision improving method of an automatic bag breaking classification box based on an AI algorithm, which comprises the following steps:

step 1: making a data set; manually collecting a garbage sample image, and labeling a garbage component label to which the garbage sample image belongs; dividing the data set into a training set, a verification set and a test set, and carrying out normalization and pretreatment on the data set;

and 3, step 3: the multi-scale features extracted by the ResNet + FPN fusion in the step2 are sent into an RPN network, the RPN network can automatically generate a plurality of anchors with different sizes, corresponding candidate frames are obtained on an original image according to the anchors, the overlapping area of the candidate frames and the training set label area is counted, and the target area frame selection is screened according to the size of the overlapping area; obtaining a loss value through a second classification and a regression equation for the screened target area candidate frames, judging whether the objects in the target frames are garbage or not according to the loss value, and simultaneously rejecting non-garbage candidate frames;

and 4, step 4: selecting frames from the garbage target area screened in the step 3, obtaining a loss value through a multi-classification and regression equation, reserving high-quality and high-precision garbage target candidate frames according to the loss value, simultaneously eliminating non-garbage candidate frames, and binding the characteristics of the garbage target candidate frames with garbage components;

In a specific embodiment, in step1, as the garbage images are of a great variety, and the acquired garbage samples cannot cover various garbage, the data set is subjected to data enhancement operations of turning, rotating, translating and changing brightness and color, so that the data set is increased; for different shapes of garbage samples, a higher-order Mixup, Cutout and Mosaic data enhancement method is adopted, so that the number of data sets is increased, the complexity of the data sets is increased, the deeper understanding of the data sets by a network model is facilitated, and the training and precision improvement of the network model are facilitated.

The step of the above Mixup: assume that fetch 1 is a fetch sample and fetch 1 is the tag corresponding to the fetch sample; the batch 2 is another batch sample, the batch 2 is a label corresponding to the batch sample, λ is a mixing coefficient calculated from a beta distribution with parameters α, β, and we can obtain the mixup principle formula as follows:

In a further preferred embodiment, in step 2; aiming at the problems of deformation, shielding, shape variation and the like of garbage articles, the common Convolution constraint cannot be well adapted to garbage under the condition, and a more efficient Deformable Convolution constraint is selected to build a depth residual error network ResNet, so that the network can be more adapted to garbage samples of different types and shapes.

The Deformable Convolution constraint step comprises:

on a convolution or ROI sampling layer, a displacement variable is added, the variable is learned according to the data condition, after the displacement, each square of a convolution kernel can be changed in a telescopic mode, the scope of a receptive field is changed, and the receptive field is a polygon.

Shown in fig. 1.1-1.4 are depictions of the formatted and generic contributions in a two-dimensional plane. Fig. 1.1 shows a general convolution with a convolution kernel size of 3 x 3 and a very regular arrangement of the samples, which is a square. Fig. 1.2 is a deformable convolution, adding an offset to each sample point (this offset is learned by an extra convolution layer), the arrangement becomes irregular. FIGS. 1.3 and 1.4 are two specific examples of deformable convolutions, for FIG. 1.3 plus offset, to achieve the effect of scaling; for fig. 1.4 plus offset, the effect of the rotation transformation is achieved.

In a further preferred scheme, the multiscale features extracted by using ResNet + FPN fusion in step2 in step 3 are sent to an RPN network for calculation, and a set number of candidate regions with the highest score values are input into an algorithm to realize candidate frame position screening by using classification and frame regression operation, so as to obtain a target region candidate frame.

Specifically, in the step 3, a method of clustering sizes of junk target objects by k-means is adopted, anchors with different sizes are counted, and corresponding candidate frames are obtained on the original image according to the anchors; and carrying out classification training on the candidate frames by using smooth L1 loss, and carrying out regression training on the candidate frames by matching with a loss function combining a recurrence loss step and a ciou loss step.

Wherein the k-means clustering step comprises the following steps:

<2> randomly selecting k data points from the data set as centroids;

<4> after all data are grouped into sets, k sets exist in total; then re-computing the centroid of each set;

<5> if the distance between the newly calculated centroid and the original centroid is smaller than a certain set threshold (indicating that the position of the newly calculated centroid does not change much and tends to be stable or convergent), we can consider that the clustering has reached the expected result and the algorithm is terminated;

given a training sample is

Each one of which is

step 1: randomly selecting k clustering centroid points as

For each class

Recalculating the centroid of the class

Where K is the given number of clusters,

is one of 1 to k(ii) a Center of mass

(average all the star coordinates inside); the first and second steps are iterated until the centroid is unchanged or varies little.

Wherein the process of the Replusion loss comprises the following steps:

<1> Replusion loss：

<2>LAttr

p ∈ P + (all positive samples): the positive samples are the set of detection frames P divided according to the set IoU threshold; g PA t t r

BP: a prediction frame is obtained after regression shift (shift) is carried out on the detection frame P;

the formula for smoothL1 is as follows:

as can be seen from the formula, smoothL1 is respectively calculated on the coordinates of the upper left corners of BP and G PA t r and the width and height (x, y, w, h), and then the sum is accumulated;

the optimization aims to shorten the distance between the prediction frame and the target frame, so that the prediction frame is as close as possible to the target frame;

<3> LRepGT

that is, G PRep is the group _ route-B which is the largest with IoU of the prediction block P among the remaining group _ routes except the group _ route-a which matches the prediction block P.

Wherein Ciou steps are as follows:

CIOU calculation formula

Wherein

wherein

LOSS calculation in CIOU regression:

。

in a further preferred scheme, in the step 4, according to the loss value obtained by training, obtaining a Class Score with the highest Score of each target recommendation area and coordinates of the recommendation area, deleting the recommendation area with the highest Score as a background, eliminating the recommendation area with the highest Score not reaching a threshold value, performing non-maximum value suppression NMS on candidate frames of the same category, eliminating-1 placeholder from frame indexes after NMS, obtaining top n, and finally returning information of each frame (y1, x1, y2, x2, Class _ ID, Score); and binding the information with the garbage component. Specifically, the loss value obtained by training is obtained by Focal loss calculation; the phenomenon that the category number of the selected garbage target area and the non-garbage candidate frame is seriously unbalanced exists, and the problem of category unbalance can be effectively solved through the following steps of local:

the mathematical definition of Focal loss is as follows:

when gamma =0, this expression degenerates to Cross control Loss,

"pt" is defined as follows, in its true sense:

adding these weights does help to handle class imbalances;

the Radam optimizer comprises the following steps:

the core idea is to estimate the first moment (momentum) and the second moment (adaptive learning rate) of each component of the gradient by exponential moving average, and remove the first moment of the normalization by using the second moment to obtain the update quantity of each step:

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that those skilled in the art may make variations, modifications, substitutions and alterations within the scope of the present invention without departing from the spirit and scope of the present invention.

Claims

1. The method for improving the garbage recognition precision of the automatic bag breaking classification box based on the AI algorithm is characterized by comprising the following steps of: the method comprises the following steps:

and step 3: sending the multi-scale features extracted by ResNet + FPN fusion in the step2 into an RPN network, wherein the RPN network can automatically generate a plurality of anchors with different sizes, obtaining corresponding candidate frames on an original image according to the anchors, counting the overlapping area of the candidate frames and the training set label area, and screening target area frame selection according to the size of the overlapping area; obtaining a loss value of the screened target area candidate frames through a secondary classification and regression equation, judging whether the objects in the target frames are garbage or not according to the loss value, and simultaneously rejecting non-garbage candidate frames; the method comprises the following steps: in the step 3, the multi-scale features extracted by ResNet and FPN fusion in the step2 are sent into an RPN for calculation, and a set number of candidate regions with the highest score values are input into an algorithm to realize candidate frame position screening by using classification and frame regression operation, so that a target region candidate frame is obtained; in the step 3, counting anchors with different sizes by adopting a method of clustering sizes of garbage target objects by k-means, and obtaining corresponding candidate frames on the original image according to the anchors; carrying out classification training on the candidate frames by using smooth L1 loss, and carrying out regression training on the candidate frames by matching with a loss function combining a recurrence loss step and a ciou loss step;

2. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein: in the step1, data enhancement operations of turning, rotating, translating and changing brightness and color are carried out on the data sets, and the number of the data sets is increased; for different shapes of garbage samples, higher-order Mixup, Cutout and Mosaic data enhancement methods are adopted, and the complexity of a data set is increased.

3. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein: the Deformable Convolution constraint step in the step2 is as follows: on a convolution or ROI sampling layer, a displacement variable is added, and the displacement variable is learned according to the condition of data; after the offset, each square of the convolution kernel can be changed in a telescopic mode, so that the range of the receptive field is changed, and the receptive field is a polygon.

4. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein:

k-means clustering step:

<1> first determine a k value;

<2> randomly selecting k data points from the data set as a centroid;

<3> calculating the distance between each point in the data set and each centroid, and dividing the point to which the centroid belongs when the point is close to which centroid;

<5> if the distance between the newly calculated centroid and the original centroid is less than a certain set threshold, the algorithm terminates;

<6> if the distance between the new centroid and the original centroid is greatly changed, the steps <3> to <5> need to be iterated;

given a training sample of { x } ⁽¹⁾ ，...，x ^(m) Each X ⁽ⁱ⁾ ∈R ⁿ I.e. each sample element is an n-dimensional vector;

step 1: randomly selecting k clustering centroid points as

For each class mu _j Recalculating the centroid of the class

Where K is the given number of clusters, c ⁽ⁱ⁾ Representative example i is closest to k classesThat class, c ⁽ⁱ⁾ Is one of 1 to k; centroid mu _j Representing a guess of the center point of the samples belonging to the same class.

5. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein:

a reporting loss step:

the prediction frame P is required to be close to the real target T of the prediction frame P, and meanwhile, the prediction frame P is required to be far away from other real frames beside the T;

<1>Repulsion loss：

L＝L _Attr +α*L _Rep GT+β*L _RepBox

the loss function includes 3 modules, which are explained below;

<2>L _Attr

the role of this module is to bring the prediction box as close as possible to its target box;

L _Attr the regression loss in the general target detection is adopted, and the SmoothL1 distance is adopted;

p belongs to P +: the positive sample is a set of detection frames P divided according to a set IoU threshold;

matching a real target box with the maximum IoU value for each detection box P;

B ^P : obtaining a prediction frame after regression deviation is carried out on the detection frame P;

the formula for smoothL1 is as follows:

wherein v is _i Is an actual standard value; t is t _i A predicted value for the convolutional neural network;

as can be seen from the formula, B ^P And

respectively calculating smoothL1 according to the coordinates of the upper left corner and the width and the height (x, y, w, h), and then accumulating the sum;

<3>L _RepGT

L _RepGT is to make the prediction frame P and the surrounding target frame G as far apart as possible;

wherein σ ∈ [0,1] is a smoothing parameter used to adjust the sensitivity of replication loss to outliers;

here, the surrounding target frame is IoU largest target frame except for the target frame already matched

Represents;

wherein G is an artificial marking frame，

Except the group _ route-a that matches the prediction block P, the group _ route-B that is IoU largest among the remaining group _ routes is the group _ route-B that matches the prediction block P.

6. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1, wherein:

CIOU step:

CIOU calculation formula

Where ρ is ² (b，b ^gt ) C represents the diagonal distance of the minimum closure area which can simultaneously contain the prediction frame and the real frame;

wherein

Wherein w ^gt And h ^gt The width and the height of a manual marking frame are indicated, and w and h are the width and the height of a prediction frame;

LOSS calculation in CIOU regression:

7. the AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 1 or 2, wherein: in the step 4, according to the loss value obtained by training, obtaining the Class Score with the highest Score and the coordinates of the recommendation area of each target recommendation area, deleting the recommendation area with the highest Score as the background, eliminating the recommendation area with the highest Score not reaching the threshold value, performing non-maximum value suppression NMS on the candidate frames of the same category, eliminating a-1 placeholder from the frame index after NMS, obtaining the front n, and finally returning the information of each frame (y1, x1, y2, x2, Class _ ID, Score); and binding the information with the garbage component.

8. The AI-algorithm-based garbage recognition accuracy improving method for the automatic bag-breaking classification box according to claim 7, characterized in that: the loss value obtained by training is obtained by Focal loss calculation; the phenomenon that the number of categories of selected frames and non-garbage candidate frames in the screened garbage target area is seriously unbalanced can be solved effectively through the local loss, and the local loss comprises the following steps:

the mathematical definition of Focal loss is as follows:

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t ).

wherein set alpha _t To control the shared weight of positive and negative samples on total loss; a, taking a smaller value to reduce the weight of the negative sample;

when γ is 0, this expression degenerates to Cross control Loss,

definition of "p _t "as follows, in its true sense:

CE(p _t )＝-log(p _t ).

adding these weights does help to handle class imbalances;

the Radam optimizer steps are as follows:

the core idea is to estimate the first moment and the second moment of each component of the gradient by exponential moving average, and remove the first moment of the normalization by the second moment to obtain the update quantity of each step:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t

wherein, g _t Is the gradient of the objective function, m _t Is a first moment, v _t Is the second moment, eta is the learning rate, c _t Is a deviation correction term, prevents divide-by-zero errors and controls the maximum scale of the update quantity, and is a parameter update quantity, where ε is a small number, and β is the value of the parameter update quantity to ensure that the denominator is not 0 ₁ And beta ₂ Refers to a constant decay rate, smaller indicates a tendency to local average, β ₁ ^t And beta ₂ ^t (ii) a Refers to the decay rate that varies over time;

radam will be very different in magnitude at the beginning of training, with upsilon _t The method has the functions of correcting the updating direction, solves the problem that the variance is very large, and solves the problem of converging to a local optimal solution by using a preheating method.