CN117853883B

CN117853883B - Street cleanliness analysis method and device, computer equipment and storage medium

Info

Publication number: CN117853883B
Application number: CN202410259533.5A
Authority: CN
Inventors: 刘子伟; 姚钊盈; 王俊宜; 林伯明
Original assignee: Shenzhen Wanwuyun Technology Co ltd
Current assignee: Shenzhen Wanwuyun Technology Co ltd
Priority date: 2024-03-07
Filing date: 2024-03-07
Publication date: 2024-05-31
Anticipated expiration: 2044-03-07
Also published as: CN117853883A

Abstract

The invention discloses a street cleanliness analysis method, a street cleanliness analysis device, computer equipment and a storage medium. The method comprises the following steps: acquiring street images, inputting a target detection model to carry out garbage identification processing, and outputting alarm images and target detection information in the alarm images; inputting the alarm image and the corresponding target detection information into a Clip model for feature extraction, and outputting the corresponding global image feature and local target image feature; the global image features and the local target image features are fused to obtain fusion feature vectors; matching corresponding text description vectors for the fusion feature vectors based on a preset target object description rule, and outputting matching probability; and regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type and specific probability value of the service worksheet. The intelligent analysis method can be used for carrying out intelligent analysis on the quantity, the type, the position and the like of the garbage on the street pavement in real time, and improves the standardization level and the operation efficiency of urban sanitation service.

Description

Street cleanliness analysis method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a street cleanliness analysis method, a street cleanliness analysis device, a street cleanliness analysis computer device, and a street cleanliness storage medium.

Background

Under the background of increasing urban degrees, high-efficiency and accurate basic service capability is provided for citizens, a good living environment is built, and the method is an important index for highlighting urban service capability. Among the many service capabilities, urban sanitation services are the most basic and citizens feel the most obvious service content.

The current urban sanitation service mode is approximately divided into 3 types, the first type is manual service, and the traditional urban sanitation service is usually implemented by adopting a manual inspection and real-time processing method, so that the flexibility is high but the efficiency is lower; the second type is an unmanned sweeper, which solves the problem of low efficiency, but cannot handle special situations such as exposed garbage in abnormal terrains such as green belts, flower beds and the like, and cannot handle larger garbage; the third type is a mode of machine inspection and manual dispatch processing in combination with the above-mentioned type 2 situation. The third type of analysis is still not sufficiently fine and requires classification by the background personnel.

Disclosure of Invention

The invention aims to provide a street cleanliness analysis method, a street cleanliness analysis device, computer equipment and a storage medium, and aims to solve the problems that the existing street roadside cleanliness analysis is not fine enough and manual classification is needed.

In a first aspect, an embodiment of the present invention provides a method for analyzing street cleanliness, including:

Acquiring street images, inputting a target detection model to carry out garbage identification processing, and outputting alarm images and target detection information in the alarm images;

Inputting the alarm image and the corresponding target detection information into a Clip model for feature extraction, and outputting corresponding global image features and local target image features;

performing fusion processing on the global image features and the local target image features to obtain fusion feature vectors;

Matching the corresponding text description vector for the fusion feature vector based on a preset target object description rule, and outputting a matching probability;

And regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type of the service worksheet and the specific probability value.

In a second aspect, an embodiment of the present invention provides a street cleanliness analyzing apparatus, including:

The recognition unit is used for acquiring street images, inputting a target detection model for garbage recognition processing, and outputting an alarm image and target detection information in the alarm image;

the feature extraction unit is used for inputting the alarm image and the corresponding target detection information into a Clip model to perform feature extraction and outputting the corresponding global image feature and local target image feature;

The feature fusion unit is used for carrying out fusion processing on the global image features and the local target image features to obtain fusion feature vectors;

The matching unit is used for matching the corresponding text description vector for the fusion feature vector based on a preset target object description rule and outputting matching probability;

and the prediction unit is used for regarding the matching probability as 1*N characteristic vectors, inputting the characteristic vectors into the worksheet classification multi-layer perceptron to perform worksheet prediction, and outputting the type of the worksheet and the specific probability value.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the street cleanliness analysis method according to the first aspect.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium, wherein the computer readable storage medium stores a computer program, which when executed by a processor, causes the processor to perform the street cleanliness analysis method as described in the first aspect above.

The embodiment of the invention discloses a street cleanliness analysis method, a street cleanliness analysis device, computer equipment and a storage medium. The method comprises the following steps: acquiring street images, inputting a target detection model to carry out garbage identification processing, and outputting alarm images and target detection information in the alarm images; inputting the alarm image and the corresponding target detection information into a Clip model for feature extraction, and outputting the corresponding global image feature and local target image feature; the global image features and the local target image features are fused to obtain fusion feature vectors; matching corresponding text description vectors for the fusion feature vectors based on a preset target object description rule, and outputting matching probability; and regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type and specific probability value of the service worksheet. The embodiment of the invention can carry out intelligent analysis on the quantity, the type, the position and the like of the garbage on the street pavement in real time, and can automatically dispatch corresponding cleaning worksheets to sanitation workers in different procedures according to the analysis result, thereby effectively improving the standardization level of urban sanitation service and improving the operation efficiency.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a street cleanliness analysis method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a method for analyzing street cleanliness according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a sub-flowchart of a street cleanliness analysis method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another sub-flow of a street cleanliness analysis method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image feature fusion subnetwork according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a worksheet classification multi-layer perceptron subnetwork provided by an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a street cleanliness analyzing device according to an embodiment of the present invention;

Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, fig. 1 is a flow chart of a street cleanliness analysis method according to an embodiment of the invention;

As shown in FIG. 1, the method includes steps S101-S105.

S101, acquiring a street image, inputting a target detection model for garbage identification processing, and outputting an alarm image and target detection information in the alarm image;

In the step, a street patrol can be performed through a manual or unmanned vehicle, a street video stream is acquired through a vehicle-mounted camera, and street images in the street video stream are input into a target detection algorithm (yolov models can be adopted, and other target detection algorithms can achieve the same effect) frame by frame. When garbage appears, outputting a street image with the garbage as an alarm image, and obtaining target detection information b _i (i=0, 1.) in the alarm image, wherein the target detection information b _i specifically contains { x _bi,y_bi,w_bi,h_bi,c_bi,p_bi }, wherein x _bi,y_bi,w_bi,h_bi is the abscissa of the center point of a rectangular frame at the position of the detected garbage, the ordinate of the center point, the width of the rectangular frame and the height of the rectangular frame, c _bi is the garbage type, and p _bi is the corresponding confidence.

In this step, the garbage types identified by the target detection algorithm may include: the method has the advantages that the garbage types and the like required to be detected by the edge target detection algorithm can be comprehensively considered according to the service types, algorithm implementation effects and the like in specific implementation of scattered garbage, packed garbage, greening garbage, stacking garbage, building furniture garbage and the like.

S102, inputting the alarming image and the corresponding target detection information into a Clip model for feature extraction, and outputting the corresponding global image feature and local target image feature;

In the step, the Clip is a pre-training large model, and the Clip backbone network is an open source pre-training model obtained based on training 4 hundred million high-quality image-text pairs, has better image-text alignment capability, and has better performance in the fields of zero-sample learning (zero-shot) and less-sample learning (few-shot); and (3) inputting the alarm image and the target detection information b _i obtained in the step (S101) into a Clip model to perform feature extraction, so that corresponding global image features and local target image features can be obtained.

S103, carrying out fusion processing on the global image features and the local target image features to obtain fusion feature vectors;

this step designs an image feature fusion sub-network (as shown in fig. 5) to fuse the global image features with the local target image features.

S104, matching corresponding text description vectors for the fusion feature vectors based on a preset target object description rule, and outputting matching probability;

In the step, in order to correspond the image with the service rule, common concept words in sanitation work can be combed according to service expert experience, so that a target object description rule is formulated, and then corresponding text description is matched for the fusion feature vector, so that the classification of the service work order is facilitated.

S105, regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type of the service worksheet and the specific probability value.

In this embodiment, based on the process of steps S101 to S105, the present application may obtain the inspection image in real time through the vehicle-mounted camera, detect the target garbage existing in the street in real time at the edge, upload the alarm image and the target detection information to the cloud, predict the text description of the corresponding alarm image having a strong association with the service rule according to the alarm image and the target detection information through the cloud model, and automatically classify the text description into the corresponding work order processing type through the work order classification multi-layer perceptron (see fig. 6), thereby effectively improving the work order dispatch efficiency, and further improving the man-machine combination work efficiency in the urban sanitation service.

In one embodiment, as shown in fig. 2, step S102 includes:

S201, acquiring a region of interest I _ri in a corresponding alarm image according to target detection information;

S202, carrying out normalization operation on an interested region I _ri and a corresponding alarm image I, and then scaling the same size to obtain an interested region I '_ri and a corresponding alarm image I';

S203, the interested region I '_ri and the corresponding warning image I' are respectively input into an image encoder of the Clip model to perform feature extraction, and local target image features and global image features are respectively obtained.

In this embodiment, the Image encoder Image-Encoder of the Clip model may employ a Vision Transformer network backbone; the specific characteristic extraction process comprises the following steps: according to the target detection information b _i, an interested region in the corresponding alarm Image is acquired and recorded as I _ri, the interested region I _ri and the alarm Image I are subjected to normalization operation at the same time, and are scaled to the same size, such as 224×224, so that an interested region I '_ri and a corresponding alarm Image I' are obtained, the interested region I '_ri and the corresponding alarm Image I' are respectively input into an Image encoder Image-Encoder of a Clip model for feature extraction, and local target Image features Region Embedding and global Image features Image Embedding are obtained.

In one embodiment, as shown in fig. 3, step S103 includes:

s301, performing concat splicing operation and conv convolution operation on all local target image features in the alarm image to obtain fused local target image features;

S302, performing add superposition operation and conv convolution operation on the fused local target image features and the corresponding global image features to obtain a primary fused feature vector;

S303, inputting original feature information of the primary fusion feature vector into an attention mechanism module for weight feature extraction, and multiplying the weight feature with the original feature information of the primary fusion feature vector to obtain a final fusion feature vector.

In this embodiment, the global image feature and the local target image feature are fused through a designed image feature fusion sub-network (as shown in fig. 5). Specifically, firstly, performing concat splicing operation on all local target image features in an alarm image, and performing conv convolution operation with a convolution kernel of 1×1 to obtain fused local target image features; then, performing add superposition operation on the fused local target image features and the corresponding global image features, and fusing through 1 conv convolution to obtain a primary fused feature vector; then, the original feature information of the primary fusion feature vector is input into a constructed Attention mechanism module (Attention Block), the Attention mechanism module comprises 2 branches, branch 1 is the original feature information of the primary fusion feature vector, branch 2 is composed of 2 full-connection layers FC and activation function layers ReLU and Sigmoid layers corresponding to the full-connection layers FC, features obtained after the original feature information passes through the branch 2 are called weight features, and the weight features are multiplied with the original feature information of the branch 1 to obtain a final fusion feature vector Image-Region embedding fused with local target information.

In one embodiment, as shown in fig. 4, step S104 includes:

s401, constructing a text prompt word library according to the garbage types, garbage positions and garbage quantity in the object description rule;

S402, inputting the text prompt word stock into a text encoder of the Clip model for encoding processing, and outputting a text description feature vector stock;

s403, carrying out norm normalization processing on the fusion feature vector and the text description feature vector library, and inputting a sigmoid layer after dot product operation to obtain the matching probability of the fusion feature vector and each text description feature vector in the text description feature vector library.

In this embodiment, the construction process of step S401 can be referred to as an example shown in table 1 below;

TABLE 1

Attributes of	Concept words
		Garbage type	Scattered garbage, packed garbage, greening garbage, piled garbage and garbage for building furniture
Garbage position	Pavement, sidewalk, green belt, flower bed and garbage point
		Quantity of refuse	No (clean), small, moderate and large amounts

The text encoder in the Clip model has the capability of context analysis, and the text encoder in the Clip model can adopt a transducer backbone to construct text prompt words, so that the text accuracy of image description can be effectively improved. For the conceptual times illustrated in Table 1, text hints may be constructed, such as: a road surface has a small amount of pictures of scattered garbage. As shown in Table 1, a total of 61 promtt cues of 5X 4X 3+1 can be constructed, with only 1 for clean streets; and inputting all prompt words into the Clip text codes to obtain corresponding text description feature vectors Text embedding (namely a text description feature vector library).

Specifically, step S403 includes:

first, the fusion feature vector and the text description feature vector are norm normalized as follows:

; f is a fusion feature vector or a text description feature vector in a text description feature vector library; f _n denotes a specific value in the nth feature vector;

Then, carrying out norm normalization processing on the fusion feature vector and the text description feature vector to respectively obtain a feature vector f _im and a feature vector f _t, wherein im represents an image, and t represents a text;

Finally, the feature vector f _im and the feature vector f _t are input into the formula Performing matching probability P calculation, wherein/>Representing a transpose of the text description feature vector f _t.

Based on the process of step S403, the matching probability of the fusion feature vector with each text description feature vector in the text description feature vector library can be obtained.

In one embodiment, step S105 includes:

regarding the matching probability P as 1*N feature vectors, wherein N represents the total number of feature vectors in the text description feature vector library;

The service work order type s and the specific probability value q are calculated according to the following formula:

；

wherein, FC is the full connection layer, geLU, softmax are the activation function after the first full connection layer and the second full connection layer respectively.

In this embodiment, in order to obtain the classification situation of the actual processing work order according to the classification situation of the service rule obtained in the above steps, the present invention regards the matching probability P obtained by matching the above alert image and the text description as a feature vector of 1×61, constructs 1 multi-layer perceptron subnetwork (MLP) as shown in fig. 6 with 2 layers fully connected, and inputs the matching probability P as a feature vector into the multi-layer perceptron subnetwork, so as to obtain the predicted service work order type s and the specific probability value q; the value of s is a serial number of the service work order type, for example, the service work order type is divided into no-processing, machine cleaning, manual cleaning and garbage cleaning combined with man-machine, and if s is 0, no-processing is needed.

Some additional details of the above steps are provided below.

In step S101, before using the target detection algorithm YOLOv, a dataset is constructed by manually collecting or opening source data, the data is an image with a fixed size, the tag is position information, type and the like of the target garbage in the image, and an optimal model can be obtained and deployed by optimizing the target function L _yolo according to the dataset. The objective function is:

；

Wherein, L _cls、L_bbox and L _obj are respectively category loss, position loss and target loss, and alpha, beta and gamma are adjustable super parameters.

In step S103, the image feature fusion sub-network is also required to be manually collected or opened with source data before use, and a training set is constructed to train, the data in the step is the original street image and the corresponding target position information in step S101, and the labels are corresponding text descriptions in the form of multi-label single-thermal coding, for example, if 61 text descriptor sequences are clean, a picture with a small amount of scattered garbage on a road surface, a picture … with a large amount of packed garbage on a green belt, and at this moment, the corresponding multi-label single-thermal coding is 011 …, which means that the picture has a small amount of scattered garbage on the road surface and a large amount of packed garbage on the green belt at the same time.

The loss function L _clip can adopt a multi-label cross entropy loss function:

；

wherein j is a sample number, a _j and The true tag distribution and the predicted tag distribution.

In step S105, the used worksheet classifying multi-layer perceptron also needs to construct a training set before use, the data of the training set is the text description feature vector representing the alarm image obtained in step S104, the label is a single-heat code of the service worksheet type, and the loss function L _order can be a cross entropy function.

In addition, in step S103 and step S105, since the output of step S103 is just the input of step S105, joint training may be performed, but the intermediate loss function L _clip is required to be preserved, so as to obtain a better training effect.

The embodiment of the invention also provides a street cleanliness analyzing device, which is used for executing any embodiment of the street cleanliness analyzing method. In particular, referring to fig. 7, fig. 7 is a schematic block diagram of a street cleanliness analyzing apparatus according to an embodiment of the present invention.

As shown in fig. 7, the street cleanliness analyzing device 700 includes: an identification unit 701, a feature extraction unit 702, a feature fusion unit 703, a matching unit 704, and a prediction unit 705.

The recognition unit 701 is used for acquiring street images, inputting a target detection model for garbage recognition processing, and outputting an alarm image and target detection information in the alarm image;

The feature extraction unit 702 is configured to input the alert image and the corresponding target detection information into the Clip model to perform feature extraction, and output the corresponding global image feature and local target image feature;

A feature fusion unit 703, configured to perform fusion processing on the global image feature and the local target image feature, so as to obtain a fusion feature vector;

The matching unit 704 is configured to match the fusion feature vector with a corresponding text description vector based on a preset target object description rule, and output a matching probability;

and the prediction unit 705 is used for regarding the matching probability as 1*N feature vectors, inputting the feature vectors into the worksheet classification multi-layer perceptron to perform worksheet prediction, and outputting the type of the worksheet and the specific probability value.

The device can acquire the inspection image in real time through the vehicle-mounted camera, detect the target garbage existing on the street at the side end in real time, upload the alarm image and the target detection information to the cloud end, predict the text description of the corresponding alarm image which is strongly correlated with the business rule according to the alarm image and the target detection information through the cloud end model, automatically classify the text description into the corresponding work order processing type through the work order classification multi-layer perceptron according to the text description, and effectively improve the work order dispatching efficiency, thereby improving the human-computer combination working efficiency in the urban sanitation service.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

The street cleanliness analyzing apparatus described above may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 8.

Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 800 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.

With reference to FIG. 8, the computer device 800 includes a processor 802, memory, and a network interface 805 connected by a system bus 801, wherein the memory may include a non-volatile storage medium 803 and an internal memory 804.

The nonvolatile storage medium 803 may store an operating system 8031 and a computer program 8032. The computer program 8032, when executed, causes the processor 802 to perform a street cleanliness analysis method.

The processor 802 is used to provide computing and control capabilities to support the operation of the overall computer device 800.

The internal memory 804 provides an environment for the execution of a computer program 8032 in the non-volatile storage medium 803, which computer program 8032, when executed by the processor 802, causes the processor 802 to perform a street cleanliness analysis method.

The network interface 805 is used for network communication such as providing transmission of data information and the like. It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture associated with the present inventive arrangements and is not limiting of the computer device 800 to which the present inventive arrangements may be applied, and that a particular computer device 800 may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 8 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 8, and will not be described again.

It should be appreciated that in embodiments of the present invention, the Processor 802 may be a central processing unit (Central Processing Unit, CPU), the Processor 802 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the street cleaning analysis method of an embodiment of the present invention.

The storage medium is a physical, non-transitory storage medium, and may be, for example, a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method of street cleanliness analysis, comprising:

Acquiring a street image, inputting a target detection model to perform garbage identification processing, and outputting an alarm image and target detection information in the alarm image, wherein the alarm image is the street image with garbage, and the target detection information comprises the position of the detected garbage, the garbage type and the corresponding confidence level;

Acquiring a region of interest I _ri in a corresponding alarm image according to the target detection information; scaling the same size after normalizing the region of interest I _ri and the corresponding alarm image I to obtain a region of interest I '_ri and a corresponding alarm image I'; respectively inputting the region of interest I '_ri and the corresponding alarm image I' into an image encoder of a Clip model to perform feature extraction to respectively obtain local target image features and global image features;

Constructing a text prompt word library according to the garbage types, garbage positions and garbage quantity in the object description rule; inputting the text prompt word stock into a text encoder of a Clip model for encoding processing, and outputting a text description feature vector stock; performing norm normalization processing on the fusion feature vector and a text description feature vector library to obtain a feature vector f _im and a feature vector f _t respectively, performing dot product operation, and then inputting a sigmoid formula to perform matching probability P calculation to obtain matching probability P of the fusion feature vector and each text description feature vector in the text description feature vector library;

Regarding the matching probability P as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to predict worksheets, and outputting service worksheet types s and specific probability values q, wherein N represents the total number of feature vectors in a text description feature vector library;

The formula for calculating the service work order type s and the specific probability value q is as follows:

；

2. The street cleanliness analyzing method according to claim 1, wherein the acquiring street images and inputting object detection models for garbage recognition processing, outputting warning images and object detection information in the warning images, comprises:

A street video stream acquired based on street patrol;

Inputting a street image in the street video stream into a target detection model to perform garbage type and garbage position identification processing, outputting the street image identified as having garbage as an alarm image, and obtaining target detection information b _i, i=0, 1in the alarm image, wherein the target detection information b _i specifically contains { x _bi,y_bi,w_bi,h_bi,c_bi,p_bi }, wherein x _bi,y_bi,w_bi,h_bi is the abscissa of the center point of a rectangular frame of the detected garbage position, the ordinate of the center point, the width of the rectangular frame, the height of the rectangular frame, c _bi is the garbage type, and p _bi is the corresponding confidence.

3. The method of claim 1, wherein the fusing the global image features and the local target image features to obtain a fused feature vector comprises:

Performing concat splicing operation and conv convolution operation on all local target image features in the alarm image to obtain fused local target image features;

Performing add superposition operation and conv convolution operation on the fused local target image features and the corresponding global image features to obtain a preliminary fused feature vector;

And inputting the original feature information of the primary fusion feature vector into an attention mechanism module for weight feature extraction, and multiplying the weight feature with the original feature information of the primary fusion feature vector to obtain a final fusion feature vector.

4. The method for analyzing street cleanliness according to claim 1, wherein the performing norm normalization processing on the fused feature vector and the text description feature vector library, and then performing a dot product operation, and inputting a sigmoid layer to obtain a matching probability of the fused feature vector and each text description feature vector in the text description feature vector library, includes:

And carrying out norm normalization processing according to the following formula:

；

f is a fusion feature vector or a text description feature vector in a text description feature vector library; f _n denotes a specific value in the nth feature vector;

Performing norm normalization processing on the fusion feature vector and the text description feature vector to obtain a feature vector f _im and a feature vector f _t respectively, wherein im represents an image and t represents a text;

inputting the feature vector f _im and the feature vector f _t into the following sigmoid formula to calculate the matching probability P:

；

wherein, Representing a transpose of the text description feature vector f _t.

5. A street cleanliness analyzing apparatus, comprising:

the recognition unit is used for acquiring street images, inputting a target detection model for garbage recognition processing, and outputting an alarm image and target detection information in the alarm image, wherein the alarm image is a street image with garbage, and the target detection information comprises the position of the detected garbage, the garbage type and the corresponding confidence level;

The feature extraction unit is used for acquiring a region of interest I _ri in the corresponding alarm image according to the target detection information; scaling the same size after normalizing the region of interest I _ri and the corresponding alarm image I to obtain a region of interest I '_ri and a corresponding alarm image I'; respectively inputting the region of interest I '_ri and the corresponding alarm image I' into an image encoder of a Clip model to perform feature extraction to respectively obtain local target image features and global image features;

The matching unit is used for constructing a text prompt word stock according to the garbage type, the garbage position and the garbage quantity in the object description rule; inputting the text prompt word stock into a text encoder of a Clip model for encoding processing, and outputting a text description feature vector stock; performing norm normalization processing on the fusion feature vector and a text description feature vector library to obtain a feature vector f _im and a feature vector f _t respectively, performing dot product operation, and then inputting a sigmoid formula to perform matching probability P calculation to obtain matching probability P of the fusion feature vector and each text description feature vector in the text description feature vector library;

The prediction unit is used for regarding the matching probability P as 1*N feature vectors, inputting the feature vectors into the worksheet classification multi-layer perceptron to perform worksheet prediction, and outputting a service worksheet type s and a specific probability value q, wherein N represents the total number of feature vectors in the text description feature vector library;

；

6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the street cleanliness analysis method according to any one of claims 1 to 4 when executing the computer program.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the street cleanliness analysis method according to any one of claims 1 to 4.