CN117853883B - Street cleanliness analysis method and device, computer equipment and storage medium - Google Patents

Street cleanliness analysis method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN117853883B
CN117853883B CN202410259533.5A CN202410259533A CN117853883B CN 117853883 B CN117853883 B CN 117853883B CN 202410259533 A CN202410259533 A CN 202410259533A CN 117853883 B CN117853883 B CN 117853883B
Authority
CN
China
Prior art keywords
feature vector
image
garbage
street
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410259533.5A
Other languages
Chinese (zh)
Other versions
CN117853883A (en
Inventor
刘子伟
姚钊盈
王俊宜
林伯明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wanwuyun Technology Co ltd
Original Assignee
Shenzhen Wanwuyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wanwuyun Technology Co ltd filed Critical Shenzhen Wanwuyun Technology Co ltd
Priority to CN202410259533.5A priority Critical patent/CN117853883B/en
Publication of CN117853883A publication Critical patent/CN117853883A/en
Application granted granted Critical
Publication of CN117853883B publication Critical patent/CN117853883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/38Outdoor scenes
    • G06V20/39Urban scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a street cleanliness analysis method, a street cleanliness analysis device, computer equipment and a storage medium. The method comprises the following steps: acquiring street images, inputting a target detection model to carry out garbage identification processing, and outputting alarm images and target detection information in the alarm images; inputting the alarm image and the corresponding target detection information into a Clip model for feature extraction, and outputting the corresponding global image feature and local target image feature; the global image features and the local target image features are fused to obtain fusion feature vectors; matching corresponding text description vectors for the fusion feature vectors based on a preset target object description rule, and outputting matching probability; and regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type and specific probability value of the service worksheet. The intelligent analysis method can be used for carrying out intelligent analysis on the quantity, the type, the position and the like of the garbage on the street pavement in real time, and improves the standardization level and the operation efficiency of urban sanitation service.

Description

Street cleanliness analysis method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a street cleanliness analysis method, a street cleanliness analysis device, a street cleanliness analysis computer device, and a street cleanliness storage medium.
Background
Under the background of increasing urban degrees, high-efficiency and accurate basic service capability is provided for citizens, a good living environment is built, and the method is an important index for highlighting urban service capability. Among the many service capabilities, urban sanitation services are the most basic and citizens feel the most obvious service content.
The current urban sanitation service mode is approximately divided into 3 types, the first type is manual service, and the traditional urban sanitation service is usually implemented by adopting a manual inspection and real-time processing method, so that the flexibility is high but the efficiency is lower; the second type is an unmanned sweeper, which solves the problem of low efficiency, but cannot handle special situations such as exposed garbage in abnormal terrains such as green belts, flower beds and the like, and cannot handle larger garbage; the third type is a mode of machine inspection and manual dispatch processing in combination with the above-mentioned type 2 situation. The third type of analysis is still not sufficiently fine and requires classification by the background personnel.
Disclosure of Invention
The invention aims to provide a street cleanliness analysis method, a street cleanliness analysis device, computer equipment and a storage medium, and aims to solve the problems that the existing street roadside cleanliness analysis is not fine enough and manual classification is needed.
In a first aspect, an embodiment of the present invention provides a method for analyzing street cleanliness, including:
Acquiring street images, inputting a target detection model to carry out garbage identification processing, and outputting alarm images and target detection information in the alarm images;
Inputting the alarm image and the corresponding target detection information into a Clip model for feature extraction, and outputting corresponding global image features and local target image features;
performing fusion processing on the global image features and the local target image features to obtain fusion feature vectors;
Matching the corresponding text description vector for the fusion feature vector based on a preset target object description rule, and outputting a matching probability;
And regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type of the service worksheet and the specific probability value.
In a second aspect, an embodiment of the present invention provides a street cleanliness analyzing apparatus, including:
The recognition unit is used for acquiring street images, inputting a target detection model for garbage recognition processing, and outputting an alarm image and target detection information in the alarm image;
the feature extraction unit is used for inputting the alarm image and the corresponding target detection information into a Clip model to perform feature extraction and outputting the corresponding global image feature and local target image feature;
The feature fusion unit is used for carrying out fusion processing on the global image features and the local target image features to obtain fusion feature vectors;
The matching unit is used for matching the corresponding text description vector for the fusion feature vector based on a preset target object description rule and outputting matching probability;
and the prediction unit is used for regarding the matching probability as 1*N characteristic vectors, inputting the characteristic vectors into the worksheet classification multi-layer perceptron to perform worksheet prediction, and outputting the type of the worksheet and the specific probability value.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the street cleanliness analysis method according to the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium, wherein the computer readable storage medium stores a computer program, which when executed by a processor, causes the processor to perform the street cleanliness analysis method as described in the first aspect above.
The embodiment of the invention discloses a street cleanliness analysis method, a street cleanliness analysis device, computer equipment and a storage medium. The method comprises the following steps: acquiring street images, inputting a target detection model to carry out garbage identification processing, and outputting alarm images and target detection information in the alarm images; inputting the alarm image and the corresponding target detection information into a Clip model for feature extraction, and outputting the corresponding global image feature and local target image feature; the global image features and the local target image features are fused to obtain fusion feature vectors; matching corresponding text description vectors for the fusion feature vectors based on a preset target object description rule, and outputting matching probability; and regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type and specific probability value of the service worksheet. The embodiment of the invention can carry out intelligent analysis on the quantity, the type, the position and the like of the garbage on the street pavement in real time, and can automatically dispatch corresponding cleaning worksheets to sanitation workers in different procedures according to the analysis result, thereby effectively improving the standardization level of urban sanitation service and improving the operation efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a street cleanliness analysis method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a method for analyzing street cleanliness according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a sub-flowchart of a street cleanliness analysis method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another sub-flow of a street cleanliness analysis method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an image feature fusion subnetwork according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a worksheet classification multi-layer perceptron subnetwork provided by an embodiment of the present invention;
FIG. 7 is a schematic block diagram of a street cleanliness analyzing device according to an embodiment of the present invention;
Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a flow chart of a street cleanliness analysis method according to an embodiment of the invention;
As shown in FIG. 1, the method includes steps S101-S105.
S101, acquiring a street image, inputting a target detection model for garbage identification processing, and outputting an alarm image and target detection information in the alarm image;
In the step, a street patrol can be performed through a manual or unmanned vehicle, a street video stream is acquired through a vehicle-mounted camera, and street images in the street video stream are input into a target detection algorithm (yolov models can be adopted, and other target detection algorithms can achieve the same effect) frame by frame. When garbage appears, outputting a street image with the garbage as an alarm image, and obtaining target detection information b i (i=0, 1.) in the alarm image, wherein the target detection information b i specifically contains { x bi,ybi,wbi,hbi,cbi,pbi }, wherein x bi,ybi,wbi,hbi is the abscissa of the center point of a rectangular frame at the position of the detected garbage, the ordinate of the center point, the width of the rectangular frame and the height of the rectangular frame, c bi is the garbage type, and p bi is the corresponding confidence.
In this step, the garbage types identified by the target detection algorithm may include: the method has the advantages that the garbage types and the like required to be detected by the edge target detection algorithm can be comprehensively considered according to the service types, algorithm implementation effects and the like in specific implementation of scattered garbage, packed garbage, greening garbage, stacking garbage, building furniture garbage and the like.
S102, inputting the alarming image and the corresponding target detection information into a Clip model for feature extraction, and outputting the corresponding global image feature and local target image feature;
In the step, the Clip is a pre-training large model, and the Clip backbone network is an open source pre-training model obtained based on training 4 hundred million high-quality image-text pairs, has better image-text alignment capability, and has better performance in the fields of zero-sample learning (zero-shot) and less-sample learning (few-shot); and (3) inputting the alarm image and the target detection information b i obtained in the step (S101) into a Clip model to perform feature extraction, so that corresponding global image features and local target image features can be obtained.
S103, carrying out fusion processing on the global image features and the local target image features to obtain fusion feature vectors;
this step designs an image feature fusion sub-network (as shown in fig. 5) to fuse the global image features with the local target image features.
S104, matching corresponding text description vectors for the fusion feature vectors based on a preset target object description rule, and outputting matching probability;
In the step, in order to correspond the image with the service rule, common concept words in sanitation work can be combed according to service expert experience, so that a target object description rule is formulated, and then corresponding text description is matched for the fusion feature vector, so that the classification of the service work order is facilitated.
S105, regarding the matching probability as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to conduct worksheet prediction, and outputting the type of the service worksheet and the specific probability value.
In this embodiment, based on the process of steps S101 to S105, the present application may obtain the inspection image in real time through the vehicle-mounted camera, detect the target garbage existing in the street in real time at the edge, upload the alarm image and the target detection information to the cloud, predict the text description of the corresponding alarm image having a strong association with the service rule according to the alarm image and the target detection information through the cloud model, and automatically classify the text description into the corresponding work order processing type through the work order classification multi-layer perceptron (see fig. 6), thereby effectively improving the work order dispatch efficiency, and further improving the man-machine combination work efficiency in the urban sanitation service.
In one embodiment, as shown in fig. 2, step S102 includes:
S201, acquiring a region of interest I ri in a corresponding alarm image according to target detection information;
S202, carrying out normalization operation on an interested region I ri and a corresponding alarm image I, and then scaling the same size to obtain an interested region I 'ri and a corresponding alarm image I';
S203, the interested region I 'ri and the corresponding warning image I' are respectively input into an image encoder of the Clip model to perform feature extraction, and local target image features and global image features are respectively obtained.
In this embodiment, the Image encoder Image-Encoder of the Clip model may employ a Vision Transformer network backbone; the specific characteristic extraction process comprises the following steps: according to the target detection information b i, an interested region in the corresponding alarm Image is acquired and recorded as I ri, the interested region I ri and the alarm Image I are subjected to normalization operation at the same time, and are scaled to the same size, such as 224×224, so that an interested region I 'ri and a corresponding alarm Image I' are obtained, the interested region I 'ri and the corresponding alarm Image I' are respectively input into an Image encoder Image-Encoder of a Clip model for feature extraction, and local target Image features Region Embedding and global Image features Image Embedding are obtained.
In one embodiment, as shown in fig. 3, step S103 includes:
s301, performing concat splicing operation and conv convolution operation on all local target image features in the alarm image to obtain fused local target image features;
S302, performing add superposition operation and conv convolution operation on the fused local target image features and the corresponding global image features to obtain a primary fused feature vector;
S303, inputting original feature information of the primary fusion feature vector into an attention mechanism module for weight feature extraction, and multiplying the weight feature with the original feature information of the primary fusion feature vector to obtain a final fusion feature vector.
In this embodiment, the global image feature and the local target image feature are fused through a designed image feature fusion sub-network (as shown in fig. 5). Specifically, firstly, performing concat splicing operation on all local target image features in an alarm image, and performing conv convolution operation with a convolution kernel of 1×1 to obtain fused local target image features; then, performing add superposition operation on the fused local target image features and the corresponding global image features, and fusing through 1 conv convolution to obtain a primary fused feature vector; then, the original feature information of the primary fusion feature vector is input into a constructed Attention mechanism module (Attention Block), the Attention mechanism module comprises 2 branches, branch 1 is the original feature information of the primary fusion feature vector, branch 2 is composed of 2 full-connection layers FC and activation function layers ReLU and Sigmoid layers corresponding to the full-connection layers FC, features obtained after the original feature information passes through the branch 2 are called weight features, and the weight features are multiplied with the original feature information of the branch 1 to obtain a final fusion feature vector Image-Region embedding fused with local target information.
In one embodiment, as shown in fig. 4, step S104 includes:
s401, constructing a text prompt word library according to the garbage types, garbage positions and garbage quantity in the object description rule;
S402, inputting the text prompt word stock into a text encoder of the Clip model for encoding processing, and outputting a text description feature vector stock;
s403, carrying out norm normalization processing on the fusion feature vector and the text description feature vector library, and inputting a sigmoid layer after dot product operation to obtain the matching probability of the fusion feature vector and each text description feature vector in the text description feature vector library.
In this embodiment, the construction process of step S401 can be referred to as an example shown in table 1 below;
TABLE 1
Attributes of Concept words
Garbage type Scattered garbage, packed garbage, greening garbage, piled garbage and garbage for building furniture
Garbage position Pavement, sidewalk, green belt, flower bed and garbage point
Quantity of refuse No (clean), small, moderate and large amounts
The text encoder in the Clip model has the capability of context analysis, and the text encoder in the Clip model can adopt a transducer backbone to construct text prompt words, so that the text accuracy of image description can be effectively improved. For the conceptual times illustrated in Table 1, text hints may be constructed, such as: a road surface has a small amount of pictures of scattered garbage. As shown in Table 1, a total of 61 promtt cues of 5X 4X 3+1 can be constructed, with only 1 for clean streets; and inputting all prompt words into the Clip text codes to obtain corresponding text description feature vectors Text embedding (namely a text description feature vector library).
Specifically, step S403 includes:
first, the fusion feature vector and the text description feature vector are norm normalized as follows:
; f is a fusion feature vector or a text description feature vector in a text description feature vector library; f n denotes a specific value in the nth feature vector;
Then, carrying out norm normalization processing on the fusion feature vector and the text description feature vector to respectively obtain a feature vector f im and a feature vector f t, wherein im represents an image, and t represents a text;
Finally, the feature vector f im and the feature vector f t are input into the formula Performing matching probability P calculation, wherein/>Representing a transpose of the text description feature vector f t.
Based on the process of step S403, the matching probability of the fusion feature vector with each text description feature vector in the text description feature vector library can be obtained.
In one embodiment, step S105 includes:
regarding the matching probability P as 1*N feature vectors, wherein N represents the total number of feature vectors in the text description feature vector library;
The service work order type s and the specific probability value q are calculated according to the following formula:
wherein, FC is the full connection layer, geLU, softmax are the activation function after the first full connection layer and the second full connection layer respectively.
In this embodiment, in order to obtain the classification situation of the actual processing work order according to the classification situation of the service rule obtained in the above steps, the present invention regards the matching probability P obtained by matching the above alert image and the text description as a feature vector of 1×61, constructs 1 multi-layer perceptron subnetwork (MLP) as shown in fig. 6 with 2 layers fully connected, and inputs the matching probability P as a feature vector into the multi-layer perceptron subnetwork, so as to obtain the predicted service work order type s and the specific probability value q; the value of s is a serial number of the service work order type, for example, the service work order type is divided into no-processing, machine cleaning, manual cleaning and garbage cleaning combined with man-machine, and if s is 0, no-processing is needed.
Some additional details of the above steps are provided below.
In step S101, before using the target detection algorithm YOLOv, a dataset is constructed by manually collecting or opening source data, the data is an image with a fixed size, the tag is position information, type and the like of the target garbage in the image, and an optimal model can be obtained and deployed by optimizing the target function L yolo according to the dataset. The objective function is:
Wherein, L cls、Lbbox and L obj are respectively category loss, position loss and target loss, and alpha, beta and gamma are adjustable super parameters.
In step S103, the image feature fusion sub-network is also required to be manually collected or opened with source data before use, and a training set is constructed to train, the data in the step is the original street image and the corresponding target position information in step S101, and the labels are corresponding text descriptions in the form of multi-label single-thermal coding, for example, if 61 text descriptor sequences are clean, a picture with a small amount of scattered garbage on a road surface, a picture … with a large amount of packed garbage on a green belt, and at this moment, the corresponding multi-label single-thermal coding is 011 …, which means that the picture has a small amount of scattered garbage on the road surface and a large amount of packed garbage on the green belt at the same time.
The loss function L clip can adopt a multi-label cross entropy loss function:
wherein j is a sample number, a j and The true tag distribution and the predicted tag distribution.
In step S105, the used worksheet classifying multi-layer perceptron also needs to construct a training set before use, the data of the training set is the text description feature vector representing the alarm image obtained in step S104, the label is a single-heat code of the service worksheet type, and the loss function L order can be a cross entropy function.
In addition, in step S103 and step S105, since the output of step S103 is just the input of step S105, joint training may be performed, but the intermediate loss function L clip is required to be preserved, so as to obtain a better training effect.
The embodiment of the invention also provides a street cleanliness analyzing device, which is used for executing any embodiment of the street cleanliness analyzing method. In particular, referring to fig. 7, fig. 7 is a schematic block diagram of a street cleanliness analyzing apparatus according to an embodiment of the present invention.
As shown in fig. 7, the street cleanliness analyzing device 700 includes: an identification unit 701, a feature extraction unit 702, a feature fusion unit 703, a matching unit 704, and a prediction unit 705.
The recognition unit 701 is used for acquiring street images, inputting a target detection model for garbage recognition processing, and outputting an alarm image and target detection information in the alarm image;
The feature extraction unit 702 is configured to input the alert image and the corresponding target detection information into the Clip model to perform feature extraction, and output the corresponding global image feature and local target image feature;
A feature fusion unit 703, configured to perform fusion processing on the global image feature and the local target image feature, so as to obtain a fusion feature vector;
The matching unit 704 is configured to match the fusion feature vector with a corresponding text description vector based on a preset target object description rule, and output a matching probability;
and the prediction unit 705 is used for regarding the matching probability as 1*N feature vectors, inputting the feature vectors into the worksheet classification multi-layer perceptron to perform worksheet prediction, and outputting the type of the worksheet and the specific probability value.
The device can acquire the inspection image in real time through the vehicle-mounted camera, detect the target garbage existing on the street at the side end in real time, upload the alarm image and the target detection information to the cloud end, predict the text description of the corresponding alarm image which is strongly correlated with the business rule according to the alarm image and the target detection information through the cloud end model, automatically classify the text description into the corresponding work order processing type through the work order classification multi-layer perceptron according to the text description, and effectively improve the work order dispatching efficiency, thereby improving the human-computer combination working efficiency in the urban sanitation service.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
The street cleanliness analyzing apparatus described above may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 8.
Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 800 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.
With reference to FIG. 8, the computer device 800 includes a processor 802, memory, and a network interface 805 connected by a system bus 801, wherein the memory may include a non-volatile storage medium 803 and an internal memory 804.
The nonvolatile storage medium 803 may store an operating system 8031 and a computer program 8032. The computer program 8032, when executed, causes the processor 802 to perform a street cleanliness analysis method.
The processor 802 is used to provide computing and control capabilities to support the operation of the overall computer device 800.
The internal memory 804 provides an environment for the execution of a computer program 8032 in the non-volatile storage medium 803, which computer program 8032, when executed by the processor 802, causes the processor 802 to perform a street cleanliness analysis method.
The network interface 805 is used for network communication such as providing transmission of data information and the like. It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture associated with the present inventive arrangements and is not limiting of the computer device 800 to which the present inventive arrangements may be applied, and that a particular computer device 800 may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 8 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 8, and will not be described again.
It should be appreciated that in embodiments of the present invention, the Processor 802 may be a central processing unit (Central Processing Unit, CPU), the Processor 802 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the street cleaning analysis method of an embodiment of the present invention.
The storage medium is a physical, non-transitory storage medium, and may be, for example, a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (7)

1. A method of street cleanliness analysis, comprising:
Acquiring a street image, inputting a target detection model to perform garbage identification processing, and outputting an alarm image and target detection information in the alarm image, wherein the alarm image is the street image with garbage, and the target detection information comprises the position of the detected garbage, the garbage type and the corresponding confidence level;
Acquiring a region of interest I ri in a corresponding alarm image according to the target detection information; scaling the same size after normalizing the region of interest I ri and the corresponding alarm image I to obtain a region of interest I 'ri and a corresponding alarm image I'; respectively inputting the region of interest I 'ri and the corresponding alarm image I' into an image encoder of a Clip model to perform feature extraction to respectively obtain local target image features and global image features;
performing fusion processing on the global image features and the local target image features to obtain fusion feature vectors;
Constructing a text prompt word library according to the garbage types, garbage positions and garbage quantity in the object description rule; inputting the text prompt word stock into a text encoder of a Clip model for encoding processing, and outputting a text description feature vector stock; performing norm normalization processing on the fusion feature vector and a text description feature vector library to obtain a feature vector f im and a feature vector f t respectively, performing dot product operation, and then inputting a sigmoid formula to perform matching probability P calculation to obtain matching probability P of the fusion feature vector and each text description feature vector in the text description feature vector library;
Regarding the matching probability P as 1*N feature vectors, inputting the feature vectors into a worksheet classification multi-layer perceptron to predict worksheets, and outputting service worksheet types s and specific probability values q, wherein N represents the total number of feature vectors in a text description feature vector library;
The formula for calculating the service work order type s and the specific probability value q is as follows:
wherein, FC is the full connection layer, geLU, softmax are the activation function after the first full connection layer and the second full connection layer respectively.
2. The street cleanliness analyzing method according to claim 1, wherein the acquiring street images and inputting object detection models for garbage recognition processing, outputting warning images and object detection information in the warning images, comprises:
A street video stream acquired based on street patrol;
Inputting a street image in the street video stream into a target detection model to perform garbage type and garbage position identification processing, outputting the street image identified as having garbage as an alarm image, and obtaining target detection information b i, i=0, 1in the alarm image, wherein the target detection information b i specifically contains { x bi,ybi,wbi,hbi,cbi,pbi }, wherein x bi,ybi,wbi,hbi is the abscissa of the center point of a rectangular frame of the detected garbage position, the ordinate of the center point, the width of the rectangular frame, the height of the rectangular frame, c bi is the garbage type, and p bi is the corresponding confidence.
3. The method of claim 1, wherein the fusing the global image features and the local target image features to obtain a fused feature vector comprises:
Performing concat splicing operation and conv convolution operation on all local target image features in the alarm image to obtain fused local target image features;
Performing add superposition operation and conv convolution operation on the fused local target image features and the corresponding global image features to obtain a preliminary fused feature vector;
And inputting the original feature information of the primary fusion feature vector into an attention mechanism module for weight feature extraction, and multiplying the weight feature with the original feature information of the primary fusion feature vector to obtain a final fusion feature vector.
4. The method for analyzing street cleanliness according to claim 1, wherein the performing norm normalization processing on the fused feature vector and the text description feature vector library, and then performing a dot product operation, and inputting a sigmoid layer to obtain a matching probability of the fused feature vector and each text description feature vector in the text description feature vector library, includes:
And carrying out norm normalization processing according to the following formula:
f is a fusion feature vector or a text description feature vector in a text description feature vector library; f n denotes a specific value in the nth feature vector;
Performing norm normalization processing on the fusion feature vector and the text description feature vector to obtain a feature vector f im and a feature vector f t respectively, wherein im represents an image and t represents a text;
inputting the feature vector f im and the feature vector f t into the following sigmoid formula to calculate the matching probability P:
wherein, Representing a transpose of the text description feature vector f t.
5. A street cleanliness analyzing apparatus, comprising:
the recognition unit is used for acquiring street images, inputting a target detection model for garbage recognition processing, and outputting an alarm image and target detection information in the alarm image, wherein the alarm image is a street image with garbage, and the target detection information comprises the position of the detected garbage, the garbage type and the corresponding confidence level;
The feature extraction unit is used for acquiring a region of interest I ri in the corresponding alarm image according to the target detection information; scaling the same size after normalizing the region of interest I ri and the corresponding alarm image I to obtain a region of interest I 'ri and a corresponding alarm image I'; respectively inputting the region of interest I 'ri and the corresponding alarm image I' into an image encoder of a Clip model to perform feature extraction to respectively obtain local target image features and global image features;
The feature fusion unit is used for carrying out fusion processing on the global image features and the local target image features to obtain fusion feature vectors;
The matching unit is used for constructing a text prompt word stock according to the garbage type, the garbage position and the garbage quantity in the object description rule; inputting the text prompt word stock into a text encoder of a Clip model for encoding processing, and outputting a text description feature vector stock; performing norm normalization processing on the fusion feature vector and a text description feature vector library to obtain a feature vector f im and a feature vector f t respectively, performing dot product operation, and then inputting a sigmoid formula to perform matching probability P calculation to obtain matching probability P of the fusion feature vector and each text description feature vector in the text description feature vector library;
The prediction unit is used for regarding the matching probability P as 1*N feature vectors, inputting the feature vectors into the worksheet classification multi-layer perceptron to perform worksheet prediction, and outputting a service worksheet type s and a specific probability value q, wherein N represents the total number of feature vectors in the text description feature vector library;
The formula for calculating the service work order type s and the specific probability value q is as follows:
wherein, FC is the full connection layer, geLU, softmax are the activation function after the first full connection layer and the second full connection layer respectively.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the street cleanliness analysis method according to any one of claims 1 to 4 when executing the computer program.
7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the street cleanliness analysis method according to any one of claims 1 to 4.
CN202410259533.5A 2024-03-07 2024-03-07 Street cleanliness analysis method and device, computer equipment and storage medium Active CN117853883B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410259533.5A CN117853883B (en) 2024-03-07 2024-03-07 Street cleanliness analysis method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410259533.5A CN117853883B (en) 2024-03-07 2024-03-07 Street cleanliness analysis method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117853883A CN117853883A (en) 2024-04-09
CN117853883B true CN117853883B (en) 2024-05-31

Family

ID=90531497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410259533.5A Active CN117853883B (en) 2024-03-07 2024-03-07 Street cleanliness analysis method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117853883B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165582A (en) * 2018-08-09 2019-01-08 河海大学 A kind of detection of avenue rubbish and cleannes appraisal procedure
WO2020206392A1 (en) * 2019-04-05 2020-10-08 Verma Pramod Kumar Voice-based social network
WO2021197341A1 (en) * 2020-04-03 2021-10-07 速度时空信息科技股份有限公司 Monocular image-based method for updating road signs and markings
CN115810165A (en) * 2022-11-21 2023-03-17 山西依迅北斗空间技术有限公司 Road cleanliness detection method and device, electronic equipment and storage medium
CN116245828A (en) * 2023-02-09 2023-06-09 复旦大学附属儿科医院 Chest X-ray quality evaluation method integrating knowledge in medical field
CN116525100A (en) * 2023-04-26 2023-08-01 脉景(杭州)健康管理有限公司 Traditional Chinese medicine prescription reverse verification method and system based on label system
CN117523177A (en) * 2023-11-09 2024-02-06 北京航天拓扑高科技有限责任公司 Gas pipeline monitoring system and method based on artificial intelligent hybrid big model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165582A (en) * 2018-08-09 2019-01-08 河海大学 A kind of detection of avenue rubbish and cleannes appraisal procedure
WO2020206392A1 (en) * 2019-04-05 2020-10-08 Verma Pramod Kumar Voice-based social network
WO2021197341A1 (en) * 2020-04-03 2021-10-07 速度时空信息科技股份有限公司 Monocular image-based method for updating road signs and markings
CN115810165A (en) * 2022-11-21 2023-03-17 山西依迅北斗空间技术有限公司 Road cleanliness detection method and device, electronic equipment and storage medium
CN116245828A (en) * 2023-02-09 2023-06-09 复旦大学附属儿科医院 Chest X-ray quality evaluation method integrating knowledge in medical field
CN116525100A (en) * 2023-04-26 2023-08-01 脉景(杭州)健康管理有限公司 Traditional Chinese medicine prescription reverse verification method and system based on label system
CN117523177A (en) * 2023-11-09 2024-02-06 北京航天拓扑高科技有限责任公司 Gas pipeline monitoring system and method based on artificial intelligent hybrid big model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的地面清洁度等级评估关键技术研究;朱骋;《中国优秀硕士学位论文全文数据库》;20230215(第02期);第1-75页 *

Also Published As

Publication number Publication date
CN117853883A (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN109800736B (en) Road extraction method based on remote sensing image and deep learning
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN112380921A (en) Road detection method based on Internet of vehicles
Myrans et al. Automated detection of fault types in CCTV sewer surveys
Ren et al. YOLOv5s-M: A deep learning network model for road pavement damage detection from urban street-view imagery
CN115035361B (en) Target detection method and system based on attention mechanism and feature cross fusion
CN111814685A (en) Hyperspectral image classification method based on double-branch convolution self-encoder
US10713541B2 (en) Systems and methods for occlusion handling in a neural network via activation subtraction
CN112967252B (en) Rail vehicle machine sense hanger assembly bolt loss detection method
CN112419202A (en) Wild animal image automatic identification system based on big data and deep learning
CN115578616A (en) Training method, segmentation method and device of multi-scale object instance segmentation model
CN115240035A (en) Semi-supervised target detection model training method, device, equipment and storage medium
CN115546742A (en) Rail foreign matter identification method and system based on monocular thermal infrared camera
CN115830399A (en) Classification model training method, apparatus, device, storage medium, and program product
CN117765480A (en) Method and system for early warning migration of wild animals along road
CN117333669A (en) Remote sensing image semantic segmentation method, system and equipment based on useful information guidance
Blier-Wong et al. Rethinking representations in P&C actuarial science with deep neural networks
CN115239765A (en) Infrared image target tracking system and method based on multi-scale deformable attention
CN113435370B (en) Method and device for acquiring vehicle queuing length based on image feature fusion
Saeed et al. Gravel road classification based on loose gravel using transfer learning
Pattanashetty et al. Traffic rules violation detection system
Qaddour et al. Automatic damaged vehicle estimator using enhanced deep learning algorithm
Liu et al. WSRD-Net: A convolutional neural network-based arbitrary-oriented wheat stripe rust detection method
CN117853883B (en) Street cleanliness analysis method and device, computer equipment and storage medium
Mittal A comprehensive survey of deep learning-based lightweight object detection models for edge devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant