CN113887519A

CN113887519A - Artificial intelligence-based garbage throwing identification method, device, medium and server

Info

Publication number: CN113887519A
Application number: CN202111277749.7A
Authority: CN
Inventors: 刘荣荣
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-04

Abstract

The invention belongs to the technical field of artificial intelligence, and particularly relates to a garbage putting identification method and device, a computer readable storage medium and a server. The method comprises the following steps: acquiring an original image frame of a target area through a camera device; carrying out human body joint point identification on the original image frame through a human body joint point identification model to obtain the position coordinates of human body elbow joint points; carrying out garbage bag identification on the original image frame through a garbage bag identification model to obtain position coordinates of a garbage bag; judging whether a garbage throwing action occurs according to the position coordinates of the human elbow joint points and the position coordinates of the garbage bag; if the garbage throwing action occurs, judging whether the garbage bag is thrown into the garbage can or not according to the position coordinate of the garbage bag; and if the garbage is thrown into the garbage can, determining the garbage type in the garbage bag through a garbage type identification model, and judging whether the garbage throwing is correct or not.

Description

Artificial intelligence-based garbage throwing identification method, device, medium and server

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a garbage putting identification method and device, a computer readable storage medium and a server.

Background

The garbage throwing point is an important public facility integrating the functions of garbage collection, storage, transfer and the like, and is closely related to the daily life of the majority of citizens. Whether the garbage throwing points can be effectively managed is a significant civil issue related to households, and is more and more paid attention and concerned by people.

In order to improve the garbage disposal capability, classification work needs to be done when the household garbage is thrown. Although the garbage classification in China has been popularized and advocated for many years, the work of garbage classification is unsatisfactory due to insufficient supervision. With the proposal of the policy of forced classification of household garbage, the traditional manual supervision mode consumes more manpower and has low efficiency when carrying out garbage classification supervision, and the current actual demand is difficult to meet.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for identifying a garbage placement, a computer-readable storage medium, and a server, so as to solve the problems in the prior art that much manpower is consumed, efficiency is low, and current actual requirements are difficult to meet.

A first aspect of an embodiment of the present invention provides a method for identifying garbage placement, which may include:

acquiring an original image frame of a target area through a preset camera device;

carrying out human body joint point recognition on the original image frame through a preset human body joint point recognition model to obtain the position coordinates of human body elbow joint points;

carrying out garbage bag identification on the original image frame through a preset garbage bag identification model to obtain position coordinates of the garbage bag;

judging whether a garbage throwing action occurs according to the position coordinates of the human elbow joint points and the position coordinates of the garbage bag;

if the garbage throwing action occurs, judging whether the garbage bag is thrown into a garbage can of the garbage throwing point or not according to the position coordinate of the garbage bag;

if the garbage is thrown into the garbage can, the garbage type in the garbage bag is determined through a preset garbage type recognition model, and whether the garbage throwing is correct or not is judged according to the garbage type.

In a specific implementation manner of the first aspect, after acquiring an original image frame of a target area by a preset imaging device, the method further includes:

performing image compression on the original image frame through a preset image compression model to obtain a compressed image frame corresponding to the original image frame;

the image compression model comprises a first convolutional neural network, an intermediate processing network and a second convolutional neural network, the original image frame is subjected to image compression through a preset image compression model to obtain a compressed image frame corresponding to the original image frame, and the method comprises the following steps:

performing convolution and downsampling processing on the original image frame by using the first convolution neural network to obtain a first processing result;

processing the first processing result by using a residual error module preset in the intermediate processing network to obtain a second processing result;

and performing convolution and up-sampling processing on the second processing result by using the second convolution neural network to obtain a compressed image frame corresponding to the original image frame.

In a specific implementation manner of the first aspect, the training process of the image compression model includes:

acquiring a training sample set from a preset training sample database; the training sample set comprises training samples, each training sample comprises an original image frame and an expected output image frame, and the expected output image frames in each training sample are in one-to-one correspondence with the original image frames;

inputting the original image frames in each training sample into the image compression model for image compression to obtain actual output image frames;

calculating a first training loss value between an expected output image frame and an actual output image frame in each training sample;

if the first training loss value is larger than a preset first threshold value, adjusting the model parameters of the image compression model until the first training loss value is smaller than or equal to the first threshold value.

In a specific implementation manner of the first aspect, the calculating a first training loss value between an expected output image frame and an actual output image frame in each training sample includes:

calculating the first training loss value according to:

wherein N is the serial number of the training sample, N is more than or equal to 1 and less than or equal to N, N is the total number of the training samples, pix is the serial number of the pixel, pix is more than or equal to 1 and less than or equal to PixN, PixN is the total number of the pixels in the image, and s_n,pixFor the value, y, of the pix-th pel of the expected output image frame of the nth training sample_n,pixIs the value of the pix-th pixel of the actual output image frame of the nth training sample, and Loss1 is the first training Loss value.

In a specific implementation manner of the first aspect, the training process of the human joint point recognition model includes:

acquiring a preset training sample set; the training sample set comprises sample images, and each sample image corresponds to a pre-labeled first set;

respectively carrying out human body joint point recognition on each sample image in the training sample set by using the human body joint point recognition model to obtain a second set corresponding to each sample image;

calculating a second training loss value of the training sample set according to the first set and the second set corresponding to each sample image;

and if the second training loss value is greater than a preset second threshold value, adjusting the model parameters of the human body joint point identification model until the second training loss value is less than or equal to the second threshold value.

In a specific implementation manner of the first aspect, the calculating a second training loss value of the training sample set according to the first set and the second set corresponding to each sample image includes:

calculating the second training loss value according to:

wherein M is the serial number of each sample image, M is more than or equal to 0 and less than or equal to M-1, M is the total number of the sample images, p is the serial number of each human body joint point, p is more than or equal to 0 and less than or equal to PN-1, PN is the number of the human body joint points, FtX_m,pAnd FtY_m,pSdX for the horizontal and vertical axis coordinates, respectively, of the p-th individual's joint point in the first set corresponding to the m-th sample image_m,pAnd SdY_m,pRespectively are the abscissa and ordinate of the p-th individual joint point in the second set corresponding to the m-th sample image, and Loss2 is the second training Loss value.

In a specific implementation manner of the first aspect, the performing trash bag identification on the original image frame through a preset trash bag identification model to obtain a position coordinate of a trash bag includes:

inputting the original image frame into the garbage bag identification model, and acquiring an identification result output by the garbage bag identification model; the identification result is a binary image;

respectively acquiring the position coordinates of each pixel which takes a preset first value as a value in the identification result;

calculating the position coordinates of the garbage bag according to the following formula:

wherein G is the serial number of the pixel with the value of the first value, G is more than or equal to 1 and less than or equal to G, G is the total number of the pixels with the value of the first value, PosX_gAnd PosY_gThe horizontal axes of the g-th pixel with the first valueCoordinates and longitudinal axis coordinates, and PosX and PosY are the horizontal axis coordinates and the longitudinal axis coordinates of the garbage bag respectively.

A second aspect of an embodiment of the present invention provides a device for identifying trash placement, which may include:

the original image frame acquisition module is used for acquiring an original image frame of a target area through a preset camera device;

the human body joint point identification module is used for carrying out human body joint point identification on the original image frame through a preset human body joint point identification model to obtain the position coordinates of human body elbow joint points;

the garbage bag identification module is used for identifying the garbage bags in the original image frame through a preset garbage bag identification model to obtain position coordinates of the garbage bags;

the garbage throwing action judging module is used for judging whether a garbage throwing action occurs according to the position coordinates of the human body elbow joint points and the position coordinates of the garbage bags;

the garbage bag feeding judging module is used for judging whether the garbage bag is fed into the garbage can of the garbage feeding point or not according to the position coordinate of the garbage bag if a garbage feeding action occurs;

and the garbage throwing judging module is used for determining the garbage type in the garbage bag through a preset garbage type identification model if the garbage is thrown into the garbage can, and judging whether the garbage throwing is correct according to the garbage type.

In a specific implementation manner of the second aspect, the garbage throwing identification device may further include:

the image compression module is used for carrying out image compression on the original image frame through a preset image compression model to obtain a compressed image frame corresponding to the original image frame;

the image compression model comprises a first convolutional neural network, an intermediate processing network and a second convolutional neural network, and the image compression module is specifically configured to: performing convolution and downsampling processing on the original image frame by using the first convolution neural network to obtain a first processing result; processing the first processing result by using a residual error module preset in the intermediate processing network to obtain a second processing result; and performing convolution and up-sampling processing on the second processing result by using the second convolution neural network to obtain a compressed image frame corresponding to the original image frame.

the image compression model training module is used for acquiring a training sample set from a preset training sample database; the training sample set comprises training samples, each training sample comprises an original image frame and an expected output image frame, and the expected output image frames in each training sample are in one-to-one correspondence with the original image frames; inputting the original image frames in each training sample into the image compression model for image compression to obtain actual output image frames; calculating a first training loss value between an expected output image frame and an actual output image frame in each training sample; if the first training loss value is larger than a preset first threshold value, adjusting the model parameters of the image compression model until the first training loss value is smaller than or equal to the first threshold value. The calculating a first training loss value between an expected output image frame and an actual output image frame in each training sample comprises: calculating the first training loss value according to:

the human body joint point recognition model training module is used for acquiring a preset training sample set; the training sample set comprises sample images, and each sample image corresponds to a pre-labeled first set; respectively carrying out human body joint point recognition on each sample image in the training sample set by using the human body joint point recognition model to obtain a second set corresponding to each sample image; calculating a second training loss value of the training sample set according to the first set and the second set corresponding to each sample image; and if the second training loss value is greater than a preset second threshold value, adjusting the model parameters of the human body joint point identification model until the second training loss value is less than or equal to the second threshold value. The calculating a second training loss value of the training sample set according to the first set and the second set corresponding to each sample image includes: calculating the second training loss value according to:

In a specific implementation manner of the second aspect, the trash bag identification module is specifically configured to: inputting the original image frame into the garbage bag identification model, and acquiring an identification result output by the garbage bag identification model; the identification result is a binary image; respectively acquiring the position coordinates of each pixel which takes a preset first value as a value in the identification result; calculating the position coordinates of the garbage bag according to the following formula:

wherein G is the serial number of the pixel with the value of the first value, G is more than or equal to 1 and less than or equal to G, G is the total number of the pixels with the value of the first value, PosX_gAnd PosY_gRespectively the horizontal axis coordinate and the vertical axis coordinate of the pixel with the g-th value as the first numerical value, and respectively PosX and PosY are the horizontal axis coordinate and the vertical axis coordinate of the garbage bag.

A third aspect of the embodiments of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of any one of the above-mentioned garbage throwing identification methods.

A fourth aspect of the embodiments of the present invention provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the garbage placement identification methods when executing the computer program.

A fifth aspect of embodiments of the present invention provides a computer program product, which, when run on a server, causes the server to perform the steps of any of the above-mentioned garbage impression recognition methods.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: in the embodiment of the invention, the original image frame of the target area is collected by the preset camera device, the human body joint point identification is carried out on the original image frame by the preset human body joint point identification model to obtain the position coordinates of the human body elbow joint point, carrying out garbage bag identification on the original image frame through a preset garbage bag identification model to obtain the position coordinates of the garbage bag, judging whether a garbage throwing action occurs according to the position coordinates of the human elbow joint points and the position coordinates of the garbage bag, if the garbage throwing action occurs, judging whether the garbage bag is thrown into the garbage can of the garbage throwing point or not according to the position coordinate of the garbage bag, if so, determining the garbage type in the garbage bag through a preset garbage type recognition model, and judging whether the garbage throwing is correct according to the garbage type. By the embodiment of the invention, the input cost of manual management can be greatly reduced, the management efficiency is improved, and the current actual requirements are effectively met.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of an embodiment of a method for identifying a garbage input according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of image compression performed on an original image frame by a preset image compression model to obtain a compressed image frame corresponding to the original image frame;

fig. 3 is a structural diagram of an embodiment of a garbage throwing recognition apparatus according to an embodiment of the present invention;

fig. 4 is a schematic block diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The execution subject of the embodiment of the invention can be a server based on artificial intelligence, and is used for executing the junk placement identification method in the embodiment of the invention. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Referring to fig. 1, an embodiment of a method for identifying spam delivery in an embodiment of the present invention may include:

and S101, acquiring an original image frame of a target area through a preset camera device.

The target area is an area where the garbage throwing point is located. The camera device can be arranged near a garbage release point in a living community, an office place or a public place, and is used for shooting video image information of the garbage release point, continuously shooting the video image information of the garbage release point and generating a video stream, wherein the shot video image information can be as follows: ASF format, AVI format, MPEG format, MOV format, etc., wherein each frame of video image information can be recorded as a frame of original image frame.

In a specific implementation of the embodiment of the invention, the monitoring camera at the garbage throwing point in the security monitoring system can be directly used as the camera shooting device, and a new monitoring camera does not need to be built again, so that the method is more convenient and saves the cost.

Optionally, in a specific implementation of the embodiment of the present invention, in order to reduce the amount of computation for performing image recognition subsequently, after the original image frame is acquired, the original image frame may further be subjected to image compression by using a preset image compression model, so as to obtain a compressed image frame corresponding to the original image frame.

The image compression model comprises three parts, namely a first convolutional neural network, an intermediate processing network and a second convolutional neural network, as shown in fig. 2, a specific processing process may include the following steps:

step S1011, performing convolution and down-sampling processing on the original image frame by using the first convolution neural network to obtain a first processing result.

The number of layers of the first convolutional neural network is denoted as LayerNum1, and the specific value of LayerNum1 may be set according to the actual situation, in the embodiment of the present invention, it is preferable to set LayerNum1 to 3, and these three layers are denoted as DL1, DL2, and DL3 in sequence. Wherein DL1 is the first layer of the first convolutional neural network, and its input is the original image frame, and its output is the result of the original image frame being processed by convolution and downsampling in DL1, where the output of DL1 is denoted as DL1_ Res; DL2 is the second layer of the first convolutional neural network, its input is DL1_ Res, and its output is the result of DL1_ Res being processed by convolution and downsampling in DL2, here, the output of DL2 is denoted as DL2_ Res; DL3 is the third layer of the first convolutional neural network, its input is DL2_ Res, and its output is the result of DL2_ Res through the convolution and down-sampling process in DL 3.

The specific processing procedure in DL1 is described in detail as follows: the number of channels of the original image frame is 3, that is, three channels of R (red), G (green), and B (blue), the convolution kernel in DL1 performs convolution processing on the original image frame to obtain a feature map (feature map) of the original image frame, the number of convolution kernels in DL1 may be set according to actual conditions, in this embodiment, it is preferably set to 8, the original image frame may be subjected to convolution processing of DL1 to obtain a feature map of 8 channels, the feature map is processed by an activation function (for example, a Linear rectification function (ReLU) may be used as the activation function), values in the feature map are limited to the range of [0,1], then the feature map is downsampled, the scale of the feature map is reduced, for example, the length and the width of the feature map may be reduced to half of the original length by downsampling processing, the feature map after down-sampling processing of DL1 is input as DL 2.

The processing procedures of DL2 and DL3 are similar to DL1, and are not described herein. Note, however, that the number of convolution kernels in DL2 is 2 times the number of convolution kernels in DL1, and the number of convolution kernels in DL3 is 2 times the number of convolution kernels in DL2, so that the number of channels of feature maps output by DL1, DL2, and DL3 is 8, 16, and 32 in this order. Finally, the feature map output by DL3 is the first processing result.

Step S1012, processing the first processing result using a residual error module preset in the intermediate processing network, to obtain a second processing result.

The intermediate processing network comprises a residual error module, the structure of the residual error module comprises two branch lines, wherein a first branch line is used for extracting deeper features of the first processing result, a second branch line is used for maintaining the first processing result, the first processing result in the first branch line can be sequentially subjected to convolution processing, ReLU function processing, convolution processing and other processes, the number of convolution kernels subjected to convolution processing in the first branch line is the same as that of convolution kernels in DL3, and therefore the number of channels of a feature map is kept unchanged in the whole processing process; and in the second branch line, the first processing result can skip the processing process in the first branch line in a jump connection mode, and the data on the two branch lines are weighted and superposed to obtain the second processing result. By the processing mode, the high-frequency characteristic of the data can be effectively kept, and the problems of gradient disappearance and gradient explosion possibly caused by deepening of the network depth are solved, so that the deeper neural network can be trained, and meanwhile, the good performance can be ensured. In the embodiment of the invention, the image features of a deeper layer in the original image frame are extracted from the first branch line, and the image features of a shallower layer in the original image frame are kept in the second branch line.

And S1013, performing convolution and up-sampling processing on the second processing result by using the second convolution neural network to obtain a compressed image frame corresponding to the original image frame.

The number of layers of the second convolutional neural network is the same as that of the first convolutional neural network, and hereinafter, a case where the number of layers is 3 is described as an example, and these three layers are sequentially referred to as UL1, UL2, and UL 3. Wherein, UL1 is the first layer of the second convolutional neural network, and its input is the second processing result and its output is the result of the second processing result after the convolution and upsampling processing in UL1, here, the output of UL1 is denoted as UL1_ Res; UL2 is the second layer of the second convolutional neural network, and its input is UL1_ Res, and its output is the result of UL1_ Res after the convolution and upsampling process in UL2, here the output of UL2 is denoted as UL2_ Res; UL3 is the third layer of the second convolutional neural network, whose input is UL2_ Res and output is the result of UL2_ Res through the convolution and upsampling process in UL 3.

The specific processing procedure in UL1 is described in detail as follows: the number of convolution kernels in UL1 is the same as that in DL2, and then the second processing result is subjected to convolution processing in UL1 to obtain a feature map of 16 channels, and then the feature map is processed by using an activation function (for example, ReLU may be used as the activation function), the value in the feature map is limited to the range of [0,1], and then the feature map is subjected to upsampling processing to expand the scale of the feature map, for example, the length and width of the feature map can be expanded to one time of the original by upsampling processing, and the feature map subjected to upsampling processing in UL1 is used as the input of UL 2.

The processing procedures of UL2 and UL3 are similar to UL1 and are not described herein. However, it should be noted that the number of convolution kernels in UL2 is half of the number of convolution kernels in UL1, and the number of convolution kernels in UL3 is 3, so that the number of channels of the feature map output by UL1, UL2 and UL3 is 16, 8 and 3 in this order. Finally, the feature map output by UL3 may be further processed by 1 convolution processing (the number of convolution kernels is 3) and 1 activation function (for example, Sigmoid may be used as an activation function), so as to obtain a compressed image frame corresponding to the original image frame. It should be noted that, a jump connection is also introduced between the first convolutional neural network and the second convolutional neural network, and before each convolution processing in the second convolutional neural network, data to be subjected to convolution processing is superimposed with an output result of the same number of channels in the first convolutional neural network, and the superimposed result is used as an input of the next convolution processing immediately after the superimposition processing.

The image compression model is obtained based on sample training, and before being put into use, the image compression model can be trained in advance through the following processes:

step S201, a training sample set is obtained from a preset training sample database.

The training sample set comprises training samples, each training sample comprises an original image frame and an expected output image frame, and the expected output image frames in each training sample are in one-to-one correspondence with the original image frames.

Step S202, inputting the original image frames in each training sample into the image compression model for image compression, and obtaining actual output image frames.

The detailed process of step S202 can refer to the detailed descriptions of step S1011 to step S1013, and the detailed description thereof is omitted here.

Step S203, calculating a first training loss value between the expected output image frame and the actual output image frame in each training sample.

Specifically, the first training loss value may be calculated according to the following equation:

And step S204, judging whether the first training loss value is larger than a preset first threshold value.

If the first training loss value is greater than the first threshold, step S205 is performed, and if the first training loss value is less than or equal to the first threshold, step S206 is performed.

And S205, adjusting the model parameters of the image compression model.

After the parameter adjustment is completed, the method returns to step S202, i.e., continues to train the image compression model using the training sample set until the first training loss value is less than or equal to the first threshold.

And S206, finishing the training of the image compression model.

When the first training loss value is less than or equal to the first threshold, it indicates that the training has reached the predetermined effect, and the training may be ended. At this time, the determined image compression model is trained by a large number of samples, and the first training loss value is kept in a small range.

It should be noted that, if the original image frame is not subjected to image compression after being acquired, the original image frames used in the subsequent steps S102 to S106 all refer to the original image frame that is not subjected to image compression. If the original image frame is subjected to image compression after being acquired, the original image frames used in the subsequent steps S102 to S106 all refer to compressed image frames obtained after image compression.

And S102, carrying out human body joint point identification on the original image frame through a preset human body joint point identification model to obtain the position coordinates of the human body elbow joint points.

The human body joint point identification model is a neural network model which is obtained by pre-training and used for human body joint point identification, and specifically which type of neural network model is adopted can be set according to actual conditions. In the embodiment of the invention, openpos is preferably used as the human body joint point identification model, and openpos is an open source model written based on a convolutional neural network and supervised learning and using caffe as a framework, so that the tracking of facial expressions, trunk, limbs and even fingers of a human can be realized, and the human body joint point identification model is not only suitable for a single person but also suitable for multiple persons, and has better robustness. And identifying key points of the human body of the original image frame by using OpenPose to obtain a position coordinate set corresponding to the original image frame. The position coordinate set includes position coordinates of various human body joint points, which may include, but are not limited to, joint points such as a neck, a left elbow, a right elbow, a left ankle, and a right ankle of a human body. In the embodiment of the present invention, other human joint recognition models may also be set according to practical situations, including but not limited to a Convolutional Postural Machine (CPM), a Stacked Hourglass Network (SHN), a Cascaded Pyramid Network (CPN), and so on. And carrying out human body joint point identification on the original image frame through any human body joint point identification model so as to obtain a position coordinate set corresponding to the original image frame. Through the mode, the action behaviors of the garbage throwing personnel can be simplified into the position change conditions of key points of the human body, and the garbage throwing behaviors can be rapidly identified conveniently.

The human body joint point recognition model is obtained based on sample training, and before being put into use, the human body joint point recognition model can be trained in advance through the following processes:

and S301, acquiring a preset training sample set.

The training sample set includes sample images, each sample image corresponds to a pre-labeled position coordinate set, which is referred to as a first set, and the first set may be stored as a json file, where the position coordinates of each human body joint in the corresponding sample image are included. The training sample set may employ various public picture data sets including, but not limited to, MSCOCO, ImageNet, Open Images Dataset, CIFAR-10, and others.

Step S302, a human body joint point recognition model is used for processing each sample image in the training sample set respectively to obtain a second set corresponding to each sample image.

For each sample image, the corresponding second set also includes the position coordinates of the respective human body joint points in the sample image. It should be noted that the first set is the expected output of the labels, and the second set is the actual output.

Step S303, calculating a second training loss value of the training sample set according to the first set and the second set corresponding to each sample image.

Specifically, a second training loss value for the set of training samples may be calculated according to:

wherein M is the serial number of each sample image, M is more than or equal to 0 and less than or equal to M-1, M is the total number of the sample images, p is the serial number of each human body joint point, p is more than or equal to 0 and less than or equal to PN-1, PN is the number of the human body joint points, FtX_m,pAnd FtY_m,pSdX for the horizontal and vertical axis coordinates, respectively, of the p-th individual's joint point in the first set corresponding to the m-th sample image_m,pAnd SdY_m,pRespectively, the horizontal axis coordinate and the vertical axis coordinate of the p-th human body joint point in the second set corresponding to the m-th sample image, and Loss2 is a second training Loss value of the training sample set.

Step S304, determining whether the second training loss value is greater than a preset second threshold.

The specific value of the second threshold may be set according to an actual situation, if the second training loss value is greater than the second threshold, step S305 is executed, and if the second training loss value is less than or equal to the second threshold, step S306 is executed.

Step S305, adjusting model parameters of the human body joint point identification model.

After the parameter adjustment is completed, the process returns to step S302, i.e., the next round of training process is started until the second training loss value is less than or equal to the second threshold.

And S306, finishing training to obtain the pre-trained human body joint point identification model.

When the second training loss value is less than or equal to the second threshold value, it indicates that the human joint point recognition model has converged, and the training may be ended at this time, where the human joint point recognition model is the pre-trained human joint point recognition model.

After the pre-trained human body joint point recognition model is obtained, the pre-trained human body joint point recognition model can be used for processing the original image frame, so that a position coordinate set corresponding to the original image frame is obtained.

Further, in consideration that the position coordinates of the human body joint points in the position coordinate set output by the human body joint point identification model may sometimes jump, in the embodiment of the present invention, after obtaining the position coordinates of each human body joint point, it is preferable to perform kalman filtering on the position coordinates of each human body joint point in the position coordinate set, so as to obtain a filtered position coordinate set, and ensure the smoothness thereof. In the subsequent process of the embodiment of the invention, the position coordinates of the human elbow joint points are mainly used.

And S103, carrying out garbage bag identification on the original image frame through a preset garbage bag identification model to obtain the position coordinates of the garbage bags.

The garbage bag identification model is a neural network model which is trained in advance and used for garbage bag identification, and specifically which type of neural network model is adopted can be set according to actual conditions. In the embodiment of the present invention, a BP neural network is preferably used as the garbage bag identification model, and other neural network models may be selected, which is not specifically limited in the embodiment of the present invention.

Before the garbage bag recognition model is used, a training data set for training the garbage bag recognition model can be constructed, and then the initial garbage bag recognition model is trained by using the training data set until a preset training condition is met, so that the trained garbage bag recognition model is obtained.

The training sample set comprises training samples, each training sample comprises a frame of image to be recognized and an expected output image, and the expected output images in the training samples are in one-to-one correspondence with the image to be recognized.

It should be noted that the expected output image is a binarized image, in the image, the value of the pixel corresponding to the garbage bag region is a preset first value, the value of the pixel corresponding to the non-garbage bag region is a preset second value, and the specific values of the first value and the second value may be set according to actual situations, for example, the first value may be set to 1, and the first value may be set to 0, and of course, the first value and the second value may also be set to other values, which is not specifically limited in the embodiment of the present invention.

For a specific training process, reference may be made to the specific description in steps S201 to S206, which is not described again in this embodiment of the present invention.

After training is finished, the original image frames can be input into a garbage bag recognition model, and recognition results output by the garbage bag recognition model are obtained. The recognition result is a binary image, and the position coordinates of the pixels taking the first value in the image are respectively obtained and recorded as:

(PosX₁,PosY₁)、(PosX₂,PosY₂)、……、(PosX_g,PosY_g)、……、(PosX_G,PosY_G)

wherein G is the serial number of the pixel with the value of the first value, G is more than or equal to 1 and less than or equal to G, G is the total number of the pixels with the value of the first value, PosX_gAnd PosY_gThe horizontal axis coordinate and the vertical axis coordinate of the g-th pixel with the first value, for example, when g is 3, PosX₃And PosY₃The 3 rd pixel with the first value is respectively the horizontal axis coordinate and the vertical axis coordinate, so that the position coordinate of the garbage bag can be calculated according to the following formula:

wherein PosX and PosY are respectively the horizontal axis coordinate and the vertical axis coordinate of the garbage bag.

For example, if G takes a value of 5, there are 5 pixels taking a value of the first value, and the position coordinates of these pixels are: (6,8), (7,7), (7,8), (7,9) and (8,8), the value of the position coordinate (PosX, PosY) of the garbage bag can be calculated to be (7, 8).

And S104, judging whether a garbage throwing action occurs or not according to the position coordinate of the human elbow joint point and the position coordinate of the garbage bag.

For any frame of original image frame, the distance between the two image frames can be calculated according to the position coordinates of the human elbow joint point and the position coordinates of the garbage bag. Calculating the distance of the original image frames of the consecutive frames respectively, so as to obtain a distance sequence D as shown in the following:

D＝(d₁,d₂,…,d_t,…,d_T)

wherein T is the serial number of the original image frame, T is more than or equal to 1 and less than or equal to T, T is the total number of the original image frame, d_tIs the distance between the position coordinates of the human elbow joint point and the position coordinates of the garbage bag in the t frame original image frame.

When the distance is unchanged, the garbage throwing action is not generated;

when the distance is continuously increased to indicate that the garbage throwing action occurs, step S105 is executed.

And S105, judging whether the garbage bag is thrown into a garbage can of the garbage throwing point or not according to the position coordinate of the garbage bag.

And when a garbage throwing action occurs, acquiring the position coordinates of the final garbage bag in the continuous multi-frame original image frames.

The position range of the garbage can of the garbage throwing point can be preset, and if the position coordinate of the garbage bag is located outside the range, the garbage bag is not thrown into the garbage can, namely, the garbage is randomly discarded by current personnel.

If the position coordinates of the garbage bag are within the range, it indicates that the garbage bag is thrown into the garbage bin, and step S106 is executed.

And S106, determining the garbage type in the garbage bag through a preset garbage type recognition model, and judging whether the garbage throwing is correct according to the garbage type.

The garbage type identification model is a neural network model which is trained in advance and used for garbage type identification, and the finally obtained garbage type can be as follows: wet waste, dry waste, recoverable waste or hazardous waste.

The specific type of neural network model can be set according to actual conditions. In the embodiment of the present invention, a BP neural network is preferably used as the garbage type identification model, and other neural network models may be selected, which is not specifically limited in the embodiment of the present invention.

Before the garbage type recognition model is used, a training data set for training the garbage type recognition model can be constructed, and then the initial garbage type recognition model is trained by using the training data set until a preset training condition is met, so that the trained garbage type recognition model is obtained. The specific training process may refer to any training method in the prior art, and the embodiment of the present invention is not limited in this respect.

After the garbage type in the garbage bag is determined, the garbage type corresponding to the garbage can into which the garbage bag is put can be further determined, and in a specific implementation manner of the embodiment of the invention, the garbage type corresponding to the garbage can be determined in a color identification manner, that is, the color of the garbage can is firstly identified, and different colors correspond to different garbage types, for example, green corresponds to wet garbage, gray corresponds to dry garbage, blue corresponds to recoverable garbage, and red corresponds to harmful garbage.

In another specific implementation manner of the embodiment of the present invention, the corresponding garbage type may be determined by recognizing characters on the trash can, for example, if characters such as "dry garbage", "wet garbage", "recyclable garbage", and "harmful garbage" are recognized on the trash can, the corresponding garbage type may be determined.

If the garbage types in the garbage bag are consistent with the garbage types corresponding to the garbage can, the garbage throwing is correct, otherwise, the garbage throwing is wrong.

In the embodiment of the invention, the face recognition can be carried out on the personnel throwing the garbage, the identity information of the personnel throwing the garbage is determined, the garbage throwing recognition result (not thrown into the garbage can, thrown into the wrong garbage can and correctly thrown) is sent to the terminal equipment corresponding to the identity information of the personnel throwing the garbage so as to remind the personnel throwing the garbage, and the result is recorded into the server of the relevant supervision department so as to carry out corresponding reward or punishment on the personnel throwing the garbage.

In summary, in the embodiment of the invention, the preset image capturing device is used to capture the original image frame of the target area, carrying out human body joint point recognition on the original image frame through a preset human body joint point recognition model to obtain the position coordinates of human body elbow joint points, carrying out garbage bag identification on the original image frame through a preset garbage bag identification model to obtain the position coordinates of the garbage bag, judging whether a garbage throwing action occurs according to the position coordinates of the human elbow joint points and the position coordinates of the garbage bag, if the garbage throwing action occurs, judging whether the garbage bag is thrown into the garbage can of the garbage throwing point or not according to the position coordinate of the garbage bag, if so, determining the garbage type in the garbage bag through a preset garbage type recognition model, and judging whether the garbage throwing is correct according to the garbage type. By the embodiment of the invention, the input cost of manual management can be greatly reduced, the management efficiency is improved, and the current actual requirements are effectively met.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 3 shows a structure diagram of an embodiment of a garbage throwing recognition apparatus according to an embodiment of the present invention, which corresponds to the garbage throwing recognition method described in the foregoing embodiment.

In this embodiment, a garbage throwing recognition device may include:

an original image frame acquisition module 301, configured to acquire an original image frame of a target area through a preset camera device;

a human body joint point identification module 302, configured to perform human body joint point identification on the original image frame through a preset human body joint point identification model, so as to obtain position coordinates of a human body elbow joint point;

a garbage bag identification module 303, configured to perform garbage bag identification on the original image frame through a preset garbage bag identification model to obtain a position coordinate of a garbage bag;

a garbage throwing behavior judging module 304, configured to judge whether a garbage throwing behavior occurs according to the position coordinate of the human elbow joint point and the position coordinate of the garbage bag;

a garbage can entering judging module 305, configured to judge whether the garbage bag is thrown into a garbage can of the garbage throwing point according to the position coordinate of the garbage bag if a garbage throwing action occurs;

and the garbage throwing judging module 306 is used for determining the garbage type in the garbage bag through a preset garbage type identification model if the garbage is thrown into the garbage can, and judging whether the garbage throwing is correct according to the garbage type.

In a specific implementation manner of the embodiment of the present invention, the garbage throwing recognition apparatus may further include:

wherein M is the serial number of each sample image, M is more than or equal to 0 and less than or equal to M-1, M is the total number of the sample images, p is the serial number of each human body joint point, p is more than or equal to 0 and less than or equal to PN-1, PN is the number of the human body joint points, FtX_m,pAnd FtY_m,pP-th person in first set corresponding to m-th sample image respectivelyThe abscissa and ordinate axes of the joint point, SdX_m,pAnd SdY_m,pRespectively are the abscissa and ordinate of the p-th individual joint point in the second set corresponding to the m-th sample image, and Loss2 is the second training Loss value.

In a specific implementation manner of the embodiment of the present invention, the trash bag identification module is specifically configured to: inputting the original image frame into the garbage bag identification model, and acquiring an identification result output by the garbage bag identification model; the identification result is a binary image; respectively acquiring the position coordinates of each pixel which takes a preset first value as a value in the identification result; calculating the position coordinates of the garbage bag according to the following formula:

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, modules and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Fig. 4 shows a schematic block diagram of a server provided by an embodiment of the present invention, and for convenience of explanation, only the parts related to the embodiment of the present invention are shown.

In this embodiment, the server 4 may include: a processor 40, a memory 41, and computer readable instructions 42 stored in the memory 41 and executable on the processor 40, such as computer readable instructions to perform the garbage impression identification method described above. The processor 40, when executing the computer readable instructions 42, implements the steps in the above-described embodiments of the garbage placement identification method, such as the steps S101 to S106 shown in fig. 1. Alternatively, the processor 40, when executing the computer readable instructions 42, implements the functions of the modules/units in the above device embodiments, such as the functions of the modules 301 to 306 shown in fig. 3.

Illustratively, the computer readable instructions 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer-readable instructions 42 in the server 4.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the server 4, such as a hard disk or a memory of the server 4. The memory 41 may also be an external storage device of the server 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the server 4. Further, the memory 41 may also include both an internal storage unit of the server 4 and an external storage device. The memory 41 is used to store the computer readable instructions and other instructions and data required by the server 4. The memory 41 may also be used to temporarily store data that has been output or is to be output.

Each functional unit in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes a plurality of computer readable instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, which can store computer readable instructions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A garbage throwing identification method based on artificial intelligence is characterized by comprising the following steps:

2. The trash drop recognition method of claim 1, further comprising, after acquiring the original image frames of the target area by a preset camera, the steps of:

3. The method according to claim 2, wherein the training process of the image compression model comprises:

4. A spam recognition method according to claim 3 wherein said calculating a first training loss value between an expected output image frame and an actual output image frame in each training sample comprises:

calculating the first training loss value according to:

5. A method according to claim 1, wherein the training process of the human joint recognition model comprises:

6. The method according to claim 5, wherein the calculating a second training loss value of the training sample set according to the first set and the second set corresponding to each sample image comprises:

calculating the second training loss value according to:

wherein M is the serial number of each sample image, M is more than or equal to 0 and less than or equal to M-1, M is the total number of the sample images, p is the serial number of each human body joint point, p is more than or equal to 0 and less than or equal to PN-1, PN is the number of the human body joint points, FtX_m,pAnd FtY_m,pSdX for the horizontal and vertical axis coordinates, respectively, of the p-th individual's joint point in the first set corresponding to the m-th sample image_m,pAnd SdY_m,pRespectively being the abscissa and ordinate of the p-th individual joint point in the second set corresponding to the m-th sample image, and Loss2 being the second training Loss value。

7. A trash can placement identification method according to any one of claims 1 to 6, wherein the step of performing trash can identification on the original image frames through a preset trash can identification model to obtain position coordinates of trash bags comprises the steps of:

8. A trash deposit recognition device, comprising:

9. A computer readable storage medium storing computer readable instructions, wherein the computer readable instructions, when executed by a processor, implement the steps of the garbage placement identification method according to any one of claims 1 to 7.

10. A server comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, characterized in that the processor when executing the computer readable instructions implements the steps of the garbage impression recognition method according to any of claims 1 to 7.