CN115761259A

CN115761259A - Kitchen waste target detection method and system based on class balance loss function

Info

Publication number: CN115761259A
Application number: CN202211418560.XA
Authority: CN
Inventors: 方乐缘; 唐崎; 欧阳立韩; 汤琳; 梁桥康
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-03-07
Anticipated expiration: 2042-11-14
Also published as: CN115761259B

Abstract

The invention discloses a kitchen garbage target detection method and a kitchen garbage target detection system based on a category balance loss function, wherein the kitchen garbage target detection method comprises the steps of constructing a kitchen garbage detection data set; constructing a kitchen garbage target detection model, which comprises a feature extraction network, a feature fusion network, an area suggestion generation network, a RoI transform network and a detection head network; constructing an L1 regression loss function and a class balance loss function, training a target detection model according to a training set in a kitchen garbage detection data set, and updating network weights of the kitchen garbage target detection model in the training by using a gradient descent method back propagation loss gradient in combination with a bounding box regression result of a class target, a classification result probability, a preset target real bounding box and a class label; and acquiring an actual kitchen waste image, and detecting according to the trained kitchen waste target detection model to obtain a target detection result. The detection precision of each type of recoverable rubbish in the kitchen garbage is effectively improved.

Description

Kitchen waste target detection method and system based on class balance loss function

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a kitchen garbage target detection method and system based on a class balance loss function.

Background

The kitchen waste is complex in component and is doped with a large amount of impurities such as glass bottles, metal cans and various plastic products. These impurities will affect the harmless and total consumption of the kitchen waste. Therefore, sorting kitchen waste becomes an urgent problem to be solved. At present, the rubbish from cooking letter sorting mainly relies on artificial mode, and manual sorting wastes time and energy, and sorts the inefficiency, and the bacterium that food that putrefaction is rotten in the letter sorting process produced also seriously influences workman's healthy simultaneously. Therefore, it is necessary to perform automatic sorting of the kitchen waste. The research on the kitchen waste intelligent detection algorithm is an essential part for realizing automatic kitchen waste sorting.

In practical situations, recyclables in kitchen waste show a severe unbalanced distribution of the number of class samples. There are some categories, such as irregular soft plastics, that account for most of the recyclables, which are referred to as the majority; while other categories, such as glass products, occupy only a small fraction of the recyclables, which are referred to as a minority category. The target detection method based on deep learning generally cannot effectively learn a small number of target features, resulting in poor detection effect. In order to improve the detection performance of a few types, the existing methods are mainly divided into two types: resampling and re-weighting. In resampling, either the majority classes are undersampled (deleting data) or the minority classes are oversampled (adding duplicate data) adjusted, or both. In the re-weighting, a small number of classes are given a larger weight and a large number of classes are given a lower weight, mainly in the classification loss function of the neural network training process. Oversampling in the resampling method introduces many repeated samples, slows down training speed, and easily causes model overfitting; while under-sampling can discard large amounts of data, as can over-sampling, which can present an over-fitting problem. Therefore, the method based on re-weighting is more widely applied. However, in practical situations, the distribution of the number of kitchen waste categories is extremely unbalanced, the detection performance cannot be effectively improved by using a simple re-weighting method, and the weights of the samples of most categories are reduced after re-weighting, so that the characteristic learning of difficult samples in most categories is influenced.

Disclosure of Invention

Aiming at the technical problems, the invention provides a kitchen garbage target detection method and system based on a class balance loss function.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a kitchen garbage target detection method based on a class balance loss function comprises the following steps:

s100: collecting kitchen waste images and constructing a kitchen waste detection data set;

s200: constructing a kitchen waste target detection model, wherein the kitchen waste target detection model comprises a feature extraction network, a feature fusion network, an area suggestion generation network, a RoI Transformer network and a detection head network which are sequentially connected, the feature extraction network is used for extracting multi-scale features of a kitchen waste image, the feature fusion network is used for fusing the multi-scale features and sending the fused multi-scale features to the area suggestion generation network, the area suggestion generation network is used for generating candidate frames of the fused multi-scale features, positive samples and negative samples are obtained according to the intersection and comparison of the candidate frames and a real boundary frame, the RoI Transformer network is used for aligning the candidate frame samples, extracting the aligned sample features according to a preset positive and negative sample proportion and inputting the aligned sample features into the detection head network to obtain a boundary regression frame result and classification result probability of a category target;

s300: constructing an L1 regression loss function and a class balance loss function, training a kitchen garbage target detection model according to a training set in a kitchen garbage detection data set, calculating a loss value of the trained kitchen garbage target detection model according to a bounding box regression result of a class target, a classification result probability, a preset target real bounding box and a class label, and performing network weight updating by using a gradient descent method to reversely propagate a loss gradient according to the loss value to obtain the trained kitchen garbage target detection model; the class balance loss function comprises an inter-class balance weighting factor based on the number of samples and a weighting factor based on the difficulty of the samples;

s400: and acquiring an actual kitchen waste image, and detecting the actual kitchen waste image according to the trained kitchen waste target detection model to obtain a target detection result.

Preferably, the feature extraction network comprises a dimension expansion top layer, a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, the dimension expansion layer performs dimension expansion on the input image to a dimension required by input of the first feature extraction stage, the first feature extraction layer comprises 3 Convnextblock blocks, the second feature extraction layer comprises 3 Convnextblock blocks, the third feature extraction layer comprises 9 Convnextblock blocks, the fourth feature extraction layer comprises 3 Convnextblock blocks, output feature graphs of the first, second and third feature extraction layers are subjected to double-size downsampling respectively and then input to a corresponding next feature extraction layer, wherein the Convnextblock blocks comprise a depth separable convolutional layer, the convolutional kernel size is 7 × 7, a normalization layer, two 1 × 1 convolutional layers and a GELU activation function.

Preferably, the feature fusion module includes a first feature fusion layer, a second feature fusion layer, a third feature fusion layer and a fourth feature fusion layer, where the ith feature fusion layer is configured to amplify the feature map output by the ith feature extraction layer to the same size as the feature map output by the previous feature extraction layer by an up-sampling method of nearest neighbor interpolation, and then fuse the feature map with the feature map output by the previous feature extraction layer by an addition operation to obtain a fused feature map L output by the ith feature fusion layer _i 。

Preferably, the area suggestion generation network includes sequentially connected 1 × 3 convolution and 2 × 1 convolutions in series, the area suggestion generation network in S200 generates a candidate frame for the fused multi-scale feature, and obtains a positive sample and a negative sample according to an intersection ratio of the candidate frame and a real bounding box, including:

fused feature map L _i After the input area suggestion generation network, a series of candidate boxes are generated. And defining the candidate box with the intersection ratio of the candidate box to the real bounding box being more than 0.7 as a positive sample, and defining the candidate box with the intersection ratio of the candidate box to the real bounding box being less than 0.3 as a negative sample.

Preferably, the constructing the L1 regression loss function and the class balance loss function in S300 includes:

s310: constructing an L1 regression loss function and a cross entropy classification loss function;

s320: designing an inter-class balance weighting factor based on the number of samples;

s330: designing a weighting factor based on sample difficulty;

s340: and constructing a class balance loss function according to the cross entropy classification loss function, the inter-class balance weighting factor based on the number of samples and the weighting factor based on the difficulty and easiness of the samples.

Preferably, S310 includes:

using the L1 regression loss function:

wherein x is an independent variable of the L1 regression loss function and represents the difference between a real bounding box and a candidate box of the class target;

using a cross-entropy classification loss function:

wherein y is a class label, corresponds to 0 and 1 in binary classification, and when the y is 1, the classification result is consistent with the real label result, and P ₁ Is a candidate in-frame classThe probability of other targets, for simplifying the formula, is defined as:

the simplified cross entropy loss function is then:

L _CE (P _t ,y)＝-log(P _t )

P _t is the classification confidence for each sample in class y, and P _t ∈(0,1)。

Preferably, S320 includes:

defining the total number of categories of the data set as N, thereby obtaining the number of samples of each category label in each training batch as N ₁ ，n ₂ ，n ₃ ，...，n _N ]Then the number of class y samples in each training batch is n _y ；

Defining a sample number normalization factor of the class label y, and the formula is as follows:

n _max the maximum value of the number of all the class samples in each training batch;

defining parameters m and n, wherein m is more than 0 and less than n and less than 1, and setting the weighting factor of each training batch type y as

And is provided with

Then the sample weighting for class y in each training batch is defined

The formula is as follows:

δ _max ,δ _min respectively for all categories in each training batchThe maximum value and the minimum value in the sample number normalization factor;

defining an adaptive inter-class balance weighting factor omega for class y in each training batch _y The formula is as follows:

n _y respectively weighting degree and number of samples of class y in each training batch, and balance weighting factor and sample weighting degree between classes

And the number of samples are all inversely related;

finally, the class balance weighting factors of all categories are normalized, so that:

where N is the total number of categories in the dataset and i is the sequence number of each category.

Preferably, S330 is specifically:

define the following sample difficulty weighting factor gamma, the formula is as follows

γ＝2-α(1+P _t )

P _t Classify confidence for each sample in each training batch, and P _t Belongs to the field of ' 0,1 ', alpha is a difficult modulation factor, and alpha belongs to the field of ' 0.5,1]。

Preferably, S340 specifically is:

in each of the training batches, the training batch,

sample weighting degree, n, for class y _y Is the number of samples of class y, α isDifficult and easy modulation factor, P _t Is the classification confidence for each sample.

A kitchen waste target detection system based on a class balance loss function comprises:

the kitchen waste image acquisition module is used for acquiring kitchen waste images and constructing a kitchen waste detection data set;

the kitchen garbage target detection model building module is used for building a kitchen garbage target detection model, the kitchen garbage target detection model comprises a feature extraction network, a feature fusion network, a region suggestion generation network, a RoI Transformer network and a detection head network which are sequentially connected, the feature extraction network is used for extracting multi-scale features of a kitchen garbage image, the feature fusion network is used for fusing the multi-scale features and sending the fused multi-scale features to the region suggestion generation network, the region suggestion generation network generates candidate frames for the fused multi-scale features, positive samples and negative samples are obtained according to the intersection and combination ratio of the candidate frames and a real boundary frame, the RoI Transformer network is used for aligning the candidate frame samples, extracting the aligned sample features according to a preset positive and negative sample ratio and inputting the aligned sample features to the detection head network, and obtaining boundary frame regression results and classification result probabilities of category targets;

the kitchen waste target detection model training module is used for constructing an L1 regression loss function and a class balance loss function, training a kitchen waste target detection model according to a kitchen waste detection training set, calculating a loss value of the trained kitchen waste target detection model according to a bounding box regression result of a class target, a classification result probability, a preset target real bounding box and a class label, and performing network weight updating by using a gradient descent method to reversely propagate a loss gradient according to the loss value to obtain the trained kitchen waste target detection model;

and the detection module is used for acquiring an actual kitchen waste image and detecting the actual kitchen waste image according to the trained kitchen waste target detection model to obtain a target detection result.

According to the kitchen garbage target detection method and system, the class balance loss function performs self-adaptive inter-class balance weighting on each class in each batch during training, so that a network model focuses on training of samples of a few classes; meanwhile, weighting the difficult samples in various classes to different degrees through sample difficult weighting factors, and mining the characteristics of the difficult samples. The method provided by the invention has the advantages that the average detection precision of a few types of samples in the kitchen waste is obviously improved, and meanwhile, the average detection precision of a plurality of types of samples is also slightly improved.

Drawings

FIG. 1 is a flowchart illustrating a kitchen waste target detection method based on a class balance loss function according to an embodiment of the present invention;

FIG. 2 is a sample of each category in a kitchen waste image dataset according to an embodiment of the present invention;

FIG. 3 is a statistical chart of the number of classes in a training set of kitchen waste image data according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a kitchen waste target detection model according to an embodiment of the present invention;

fig. 5 is a schematic diagram comparing the detection results of the kitchen waste image data set by the method of the present invention and the existing method, wherein, (a) is the detection result of the cross entropy loss method, (b) is the detection result of the focus loss method, (c) is the detection result of the method, and (d) is the result of the real tag.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention is further described in detail below with reference to the accompanying drawings.

In one embodiment, as shown in fig. 1, a kitchen waste target detection method based on a class balance loss function includes the following steps:

s100: and (4) collecting kitchen waste images and constructing a kitchen waste detection data set.

Specifically, first, image data of the kitchen waste disposal station is acquired, and 13871 kitchen waste images are acquired in total.

Secondly, as shown in fig. 2, according to the national garbage category definition, the recyclable garbage in the data set image is classified, and then the recyclable garbage target in the image is labeled by a rotating rectangular frame in a manual labeling manner by using labeling software. The specific recoverable garbage categories and the labeled quantities thereof are respectively as follows: 80866 as irregular soft plastic, 46520 as regular soft plastic, 251166 as hard plastic, 6358 as tetra pack, 5776 as plastic bottle, 2698 as glass product and 2361 as metal product, and 7 types.

And finally, randomly dividing the kitchen garbage data into training sets and testing sets according to the proportion of 4. The number of labels of each class of training data is shown in fig. 3.

Further, after the kitchen garbage detection data set is constructed, preprocessing a training set in the kitchen garbage detection data set, namely, scaling all pictures in the kitchen garbage training data set to be 1024 × 1024 by adopting bilinear interpolation; in order to prevent overfitting during network training and improve the generalization performance of the network, data enhancement is carried out by randomly overturning according to the horizontal axis, the vertical axis and the diagonal line, the overturning probabilities are 0.25, 0.25 and 0.25 respectively, and finally normalization processing is carried out on the images according to set mean values (123.676, 116.28, 103.53) and variances (58.395, 57.12 and 57.375).

S200: the kitchen waste target detection method comprises the steps of constructing a kitchen waste target detection model, wherein the kitchen waste target detection model comprises a feature extraction network, a feature fusion network, an area suggestion generation network, a RoI Transformer network and a detection head network which are sequentially connected, the feature extraction network is used for extracting multi-scale features of a kitchen waste image, the feature fusion network is used for fusing the multi-scale features and sending the fused multi-scale features to the area suggestion generation network, the area suggestion generation network is used for generating candidate frames of the fused multi-scale features, positive samples and negative samples are obtained according to the intersection and comparison of the candidate frames and a real boundary frame, the RoI Transformer network is used for aligning the candidate frame samples, extracting the aligned sample features according to a preset positive and negative sample proportion and inputting the aligned sample features to the detection head network, and obtaining a boundary regression frame result and classification result probability of a category target.

Specifically, a structure diagram of the kitchen waste target detection model is shown in fig. 4.

In one embodiment, the feature extraction network comprises a dimension expansion top layer, a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer which are connected in sequence, the dimension expansion layer performs dimension expansion on an input image to a dimension required by input of a first feature extraction stage, the first feature extraction layer comprises 3 Convnextblock blocks, the second feature extraction layer comprises 3 Convnextblock blocks, the third feature extraction layer comprises 9 Convnextblock blocks, the fourth feature extraction layer comprises 3 Convnextblock blocks, output feature graphs of the first, second and third feature extraction layers are subjected to double-size down-sampling respectively and then input to a corresponding next feature extraction layer, wherein the Convnextblock comprises a depth separable convolution layer, convolution kernels are 7 x 7 in size, a normalization layer, two 1 x 1 convolution layers and a GELU activation function.

Specifically, a first feature extraction layer and a second feature extraction layer are used for shallow feature extraction, and a third feature extraction layer and a fourth feature extraction layer are used for deep feature extraction.

In one embodiment, the feature fusion module includes a first feature fusion layer, a second feature fusion layer, a third feature fusion layer and a fourth feature fusion layer, where the ith feature fusion layer is configured to amplify a feature map output by the ith feature extraction layer to the same size as an output feature map of the last feature extraction layer by an up-sampling method of nearest neighbor interpolation, and then fuse the feature map with a feature map output by the last feature extraction layer by an addition operation to obtain a fused feature map L output by the ith feature fusion layer _i 。

In one embodiment, the generating network of the region suggestion includes sequentially and serially connecting 1 × 3 convolution and 2 × 1 convolutions, the generating network of the region suggestion in S200 generates a candidate frame for the fused multi-scale feature, and obtains a positive sample and a negative sample according to an intersection ratio of the candidate frame and a real bounding box, including:

fused feature map L _i After the input area suggestion generation network, a series of candidate frames are generated, wherein the candidate frames and the real edgesThe candidate box with the intersection ratio of the bounding box being more than 0.7 is defined as a positive sample, and the candidate box with the intersection ratio of the bounding box being less than 0.3 is defined as a negative sample.

S300: constructing an L1 regression loss function and a class balance loss function, training a kitchen garbage target detection model according to a training set in a kitchen garbage detection data set, calculating a loss value of the trained kitchen garbage target detection model according to a bounding box regression result of a class target, a classification result probability, a preset target real bounding box and a class label, and performing network weight updating by using a gradient descent method to reversely propagate a loss gradient according to the loss value to obtain the trained kitchen garbage target detection model; the class balance loss function comprises an inter-class balance weighting factor based on the number of samples and a weighting factor based on the difficulty of the samples.

Specifically, the configuration of network model training is set as follows: setting the maximum iteration round number epochs as 12, setting the size of a training batch as 4, reading 4 pictures and corresponding picture marking files during each training, selecting AdamW by an optimizer, setting the initial learning rate as 0.0002, and setting the weight attenuation as 0.05.

Setting the configuration of the network model test: the area suggests that the network non-maximum suppression threshold is set to 0.7, the cross-over ratio threshold of the detection head network is set to 0.01, and the score threshold is set to 0.05.

In one embodiment, constructing the L1 regression loss function and the class balance loss function in S300 includes:

s330: designing a weighting factor based on sample difficulty;

s340: and constructing a class balance loss function according to the cross entropy classification loss function, the inter-class balance weighting factor based on the number of samples and the weighting factor based on the difficulty and the easiness of the samples.

In one embodiment, S310 includes:

using the L1 regression loss function:

using a cross-entropy classification loss function:

wherein y is a category label, corresponds to 0 and 1 in binary classification, and when the y is 1, the classification result is consistent with the real label result, and P ₁ For the probability of being a category target in the candidate box, defining for a simplified formula:

the simplified cross entropy loss function is then:

L _CE (P _t ,y)＝-log(P _t )

P _t is the classification confidence of each sample in class y, and P _t ∈(0,1)。

In one embodiment, S320 includes:

defining the total number of classes of the data set to be N, thereby obtaining the number of samples of each class label in each training batch to be N ₁ ，n ₂ ，n ₃ ，...，n _N ]Then the number of class y samples in each training batch is n _y ；

defining parameters m and n, wherein m is more than 0 and n is less than 1, and setting each training batch classY is a weighting factor of

And is provided with

Then the sample weighting for class y in each training batch is defined

The formula is as follows:

δ _max ,δ _min respectively normalizing the maximum value and the minimum value in the factors for the number of samples of all classes in each training batch;

And the number of samples are all inversely related;

wherein N is the total number of the categories in the kitchen garbage detection data set, and i is the serial number of each category.

In one embodiment, S330 specifically is:

γ＝2-α(1+P _t )

Specifically, the use of the sample difficult and easy weighting factor γ alleviates the problem of insufficient learning of the difficult sample features thereof due to the fact that the inter-class balance weighting factor reduces the loss contribution of most classes, so that the loss function effectively mines the features of the difficult samples.

In one embodiment, S340 specifically includes:

in each of the training batches, the number of training batches,

sample weighting degree, n, for class y _y Number of samples in category y, α is a difficult modulation factor, P _t The classification confidence for each sample.

Specifically, before training, the class balance loss function needs to be set as a hyper-parameter when calculating the inter-class balance weighting factors of all classes of each training batch, and 0 < m < n < 1. In the present invention, the parameters m and n are 0.4 and 0.6, respectively. When the difficulty weighting factor gamma of the samples of each training batch is calculated, the difficulty modulation factor alpha is used as a hyper-parameter, and an optimal value needs to be found in multiple experiments, wherein the difficulty modulation factor alpha takes a value of 0.63. And performing iterative training on the constructed network model by using the set training configuration to obtain a trained kitchen garbage target detection model.

The advantages of the invention are illustrated below with reference to one embodiment.

The method of the invention is compared with two existing methods; existing methods include Cross Entropy classification Loss function (Cross entry Loss), and focus classification Loss function (Focal Loss).

The comparison of the average detection precision results of each category on the test set by the method and the existing method is shown in table 1:

TABLE 1 average detection accuracy result comparison for each category of kitchen garbage test set

As can be seen from table 1, the method of the present invention shows better quantification results than the other methods. The visual comparison result of the method of the present invention and other methods is shown in fig. 5 (in fig. 5, fig. 5 (a) is the detection result of the cross entropy loss method, fig. 5 (b) is the detection result of the focus loss method, fig. 5 (c) is the detection result of the method, and fig. 5 (d) is the result of the real label). As can be seen from both the quantitative result and the visual result, the method has the best detection effect.

In one embodiment, a kitchen waste target detection system based on a class balance loss function includes:

the kitchen waste target detection model training module is used for constructing an L1 regression loss function and a class balance loss function, training a kitchen waste target detection model according to a training set in a kitchen waste detection data set, calculating a loss value of the trained kitchen waste target detection model according to a bounding box regression result of a class target, a classification result probability, a preset target real bounding box and a class label, and performing network weight updating by using a gradient descent method to reversely propagate a loss gradient according to the loss value to obtain the trained kitchen waste target detection model;

For specific limitation of the kitchen waste target detection system based on the category balance loss function, reference may be made to the description of a kitchen waste target detection method based on the category balance loss function, which is not repeated herein.

Compared with the prior art, the invention has the following beneficial effects:

the invention designs a kitchen garbage target detection method and system based on a class balance loss function, and a classification loss function in a target detection network is constructed. In each batch of the constructed class balance loss function during training, self-adaptive inter-class balance weighting is carried out on each class, so that a network model can pay attention to training of samples of a few classes; meanwhile, weighting the difficult samples in various classes to different degrees through the sample difficult weighting factors, and mining the characteristics of the difficult samples. The method provided by the invention has the advantages that the average detection precision of the few types of samples such as plastic bottles, glass products and metal products is remarkably improved, and meanwhile, the average detection precision of the many types of samples is also slightly improved.

The kitchen waste target detection method and system based on the class balance loss function provided by the invention are described in detail above. The principles and embodiments of the present invention have been described herein using specific examples, which are presented only to assist in understanding the core concepts of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A kitchen waste target detection method based on a class balance loss function is characterized by comprising the following steps:

s200: constructing a kitchen garbage target detection model, wherein the kitchen garbage target detection model comprises a feature extraction network, a feature fusion network, a region suggestion generation network, a RoI Transformer network and a detection head network which are sequentially connected, the feature extraction network is used for extracting multi-scale features of a kitchen garbage image, the feature fusion network is used for fusing the multi-scale features and sending the fused multi-scale features to the region suggestion generation network, the region suggestion generation network is used for generating candidate frames of the fused multi-scale features, positive samples and negative samples are obtained according to the intersection and comparison of the candidate frames and a real boundary frame, and the RoI Transformer network is used for extracting aligned sample features according to a preset positive and negative sample proportion and inputting the aligned sample features to the detection head network after aligning the candidate frame samples to obtain a boundary frame regression result and a classification result probability of a category target;

s300: constructing an L1 regression loss function and a class balance loss function, training the kitchen garbage target detection model according to a training set in the kitchen garbage detection data set, calculating a loss value of the trained kitchen garbage target detection model according to a bounding box regression result of the class target, a classification result probability, a preset target real bounding box and a class label, and performing network weight updating by using a gradient descent method to reversely propagate a loss gradient according to the loss value to obtain the trained kitchen garbage target detection model; wherein, the class balance loss function comprises an inter-class balance weighting factor based on the number of samples and the weighting factor based on the difficulty of the samples;

2. The method of claim 1, wherein the feature extraction network comprises a dimension expansion top layer, a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer, the dimension expansion layer performs dimension expansion on the input image to a dimension required by the input of the first feature extraction stage, the first feature extraction layer comprises 3 Convnextblock blocks, the second feature extraction layer comprises 3 Convnextblock blocks, the third feature extraction layer comprises 9 Convnextblock blocks, the fourth feature extraction layer comprises 3 Convnextblock blocks, and output feature maps of the first, second and third feature extraction layers are subjected to double-size down-sampling and then input to a corresponding next feature extraction layer, wherein the Convnextblock comprises a depth separable convolution layer with convolution kernel size of 7 x 7, a normalization layer, two 1 x 1 convolution layers and a GEACTIVATION function.

3. The method according to claim 2, wherein the feature fusion module comprises a first feature fusion layer, a second feature fusion layer, a third feature fusion layer and a fourth feature fusion layer, the ith feature fusion layer is used for amplifying the feature map output by the ith feature extraction layer to the same size as the feature map output by the last feature extraction layer by an up-sampling method of nearest neighbor interpolation, and then fusing the feature map with the feature map output by the last feature extraction layer through an adding operation to obtain a fused feature map L output by the ith feature fusion layer _i 。

4. The method of claim 3, wherein the area suggestion generation network comprises sequentially connected 1 x 3 convolution and 2 x 1 convolutions, the area suggestion generation network performs candidate box generation on the fused multi-scale feature in S200, and positive samples and negative samples are obtained according to an intersection-and-merge ratio of the candidate box and a real bounding box, including:

the fused feature map L _i And after the area suggestion generation network is input, generating a series of candidate boxes, wherein the candidate boxes with the intersection ratio of the candidate boxes to the real boundary box being more than 0.7 are defined as positive samples, and the candidate boxes with the intersection ratio to the real boundary box being less than 0.3 are defined as negative samples.

5. The method of claim 3, wherein the L1 regression loss function and the class balance loss function constructed in S300 comprise:

s330: designing a weighting factor based on sample difficulty;

s340: and constructing a class balance loss function according to the cross entropy classification loss function, the inter-class balance weighting factor based on the number of the samples and the weighting factor based on the difficulty and the easiness of the samples.

6. The method of claim 5, wherein S310 comprises:

using the L1 regression loss function:

using a cross-entropy classification loss function:

wherein y is a class label, corresponds to 0 and 1 in binary classification, and when the y is 1, the classification result is consistent with the real label result, and P ₁ For the probability of being a category target within the candidate box, for a simplified formula, define:

the simplified cross entropy loss function is then:

L _CE (P _t ,y)＝-log(P _t )

7. The method of claim 5, wherein S320 comprises:

defining the total number of classes of the data set to be N, thereby obtaining the number of samples of each class label in each training batch to be N ₁ ，n ₂ ，n ₃ ，...，n _N ]Then the number of samples in class y in each training batch is n _y ；

Defining a sample number normalization factor of the category label y, wherein the formula is as follows:

And is provided with

Then the sample weighting for class y in each training batch is defined

The formula is as follows:

And the number of samples are all inversely related;

wherein N is the total number of the categories in the kitchen garbage data set, and i is the serial number of each category.

8. The method according to claim 5, wherein S330 is specifically:

γ＝2-α(1+P _t )

9. The method according to claim 5, wherein S340 specifically is:

in each of the training batches, the training batch,

10. A kitchen waste target detection system based on a class balance loss function is characterized by comprising:

the kitchen garbage target detection model building module is used for building a kitchen garbage target detection model, the kitchen garbage target detection model comprises a feature extraction network, a feature fusion network, a region suggestion generation network, a RoI Transformer network and a detection head network which are sequentially connected, the feature extraction network is used for extracting multi-scale features of a kitchen garbage image, the feature fusion network is used for fusing the multi-scale features and sending the fused multi-scale features to the region suggestion generation network, the region suggestion generation network is used for generating candidate frames of the fused multi-scale features, positive samples and negative samples are obtained according to the intersection ratio of the candidate frames and a real boundary frame, the RoI Transformer network is used for extracting the aligned sample features according to a preset positive sample proportion and negative sample proportion and inputting the aligned sample features to the detection head network after aligning the candidate frame samples, and boundary frame regression results and classification result probabilities of category targets are obtained;

the kitchen waste target detection model training module is used for constructing an L1 regression loss function and a class balance loss function, training the kitchen waste target detection model according to a training set in the kitchen waste detection data set, calculating a loss value of the trained kitchen waste target detection model according to a bounding box regression result of the class target, a classification result probability, a preset target real bounding box and a class label, and performing network weight updating by using a gradient descent method to reversely propagate a loss gradient according to the loss value to obtain the trained kitchen waste target detection model;