CN111723136A - Single-dimensional clustering analysis method for classified and graded treatment of grid events - Google Patents

Single-dimensional clustering analysis method for classified and graded treatment of grid events Download PDF

Info

Publication number
CN111723136A
CN111723136A CN201911143455.8A CN201911143455A CN111723136A CN 111723136 A CN111723136 A CN 111723136A CN 201911143455 A CN201911143455 A CN 201911143455A CN 111723136 A CN111723136 A CN 111723136A
Authority
CN
China
Prior art keywords
grid event
grid
text
event
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911143455.8A
Other languages
Chinese (zh)
Inventor
钱华
姜永华
钱建华
王巧荣
房查
张宏斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Fablesoft Co ltd
Political And Legal Committee Of Nantong Municipal Committee Of Communist Party Of China
Original Assignee
Jiangsu Fablesoft Co ltd
Political And Legal Committee Of Nantong Municipal Committee Of Communist Party Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Fablesoft Co ltd, Political And Legal Committee Of Nantong Municipal Committee Of Communist Party Of China filed Critical Jiangsu Fablesoft Co ltd
Priority to CN201911143455.8A priority Critical patent/CN111723136A/en
Publication of CN111723136A publication Critical patent/CN111723136A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a single-dimensional clustering analysis method for classified and graded treatment of grid events, which comprises the following steps: acquiring a grid event record; cleaning the acquired grid event record, and only keeping a grid event condition text; extracting semantic features of a grid event situation text by using a BERT model to generate a multi-dimensional feature vector; clustering the text data of the grid event condition by using a single-dimensional clustering algorithm; generating K grid event type clusters of different types, and storing the grid event type clusters into HBASE; and (6) identifying the type of the high-occurrence grid event. According to the scheme, the dispersed grid event records can be efficiently subjected to fusion analysis, the high-occurrence grid events are accurately subjected to key monitoring and early warning, and the efficiency of classified and graded disposal of the grid events is improved.

Description

Single-dimensional clustering analysis method for classified and graded treatment of grid events
Technical Field
The invention relates to an analysis method, in particular to a single-dimensional clustering analysis method for classified and graded treatment of grid events, and belongs to the technical field of grid event analysis.
Background
At present, the types of events facing the society of the whole society are various, and in order to analyze various events and disputes in a grid under a grid social governance mode, a basic grid worker is generally required to record and report various events such as public security, contradiction disputes, fire safety, food sanitation safety and the like existing in the grid.
However, the method of reporting by only depending on manual record is not only lack of accuracy judgment standard for grid event conditions, but also lack of fusion analysis for scattered event records, so that it is difficult to perform targeted key monitoring and early warning for high-occurrence grid event types in a grid area, and the efficiency of grid event handling is seriously affected. Therefore, a new solution to solve the above technical problems is urgently needed.
Disclosure of Invention
The invention provides a single-dimensional clustering analysis method for classified and graded treatment of grid events, aiming at the problems in the prior art, the scheme can efficiently perform fusion analysis on scattered grid event records, accurately perform key monitoring and early warning on high-occurrence grid events, and improve the efficiency of classified and graded treatment of the grid events.
In order to achieve the above object, a technical solution of the present invention is as follows, in which an efficiency of classified and hierarchical handling of grid events is improved, the method includes the following steps:
acquiring a grid event record;
cleaning the acquired grid event record, and only keeping a grid event condition text;
extracting semantic features of a grid event situation text by using a BERT model to generate a multi-dimensional feature vector;
clustering grid event condition data by using a single-dimensional clustering algorithm;
generating K grid event type clusters of different types, and storing the grid event type clusters into HBASE;
and (6) identifying the type of the high-occurrence grid event.
As an improvement of the present invention, the step (1) specifically comprises: grid event records are extracted from a multi-source database.
As an improvement of the present invention, the step (2) specifically comprises: and cleaning the extracted grid event record by using an ETL tool, removing fields such as an event occurrence place, an event occurrence time and the like, and only keeping a grid event situation text.
As an improvement of the present invention, the step (3) specifically comprises: and inputting a grid event situation text, calculating a weight value of the text by using a BERT model, and outputting a multi-dimensional semantic feature vector of the text.
As a refinement of the invention, said step (4) comprises the following sub-steps:
(41) calculating the similarity value simVal between any one preselected grid event situation text and the rest event situation texts; calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2
C, presetting a value range: (0.8-0.9) c is a weight parameter
D, presetting a value range: (0.1-0.2) d is a weight parameter
X1: cosine value of vector included angle;
x2: vector Euclidean distance normalization value;
(42) generating N clusters containing all grid event situation texts similar to any one preselected grid event situation text according to the similarity value calculation result;
(43) selecting N pre-selected grid event condition texts corresponding to the N grid event condition text clusters as a central text;
(44) calculating a similarity value simVal between any two central texts in the N central texts;
calculation procedure 1) using the calculation formula simVal ═ cX1+ dX 2;
c is 0.9, and c is a weight parameter;
d is 0.1, and d is a weight parameter;
x1: cosine value of vector included angle;
x2: vector Euclidean distance normalization value;
(45) if any two central files are similar, deleting the smaller cluster of the two clusters containing any two central texts; if any two central files are not similar, two clusters containing any two central texts are reserved;
(46) outputting M clusters generated by the initial clustering;
(47) and carrying out secondary de-duplication on the output M clusters, wherein M is smaller than N.
As an improvement of the present invention, the step (6) is specifically to sort the K grid event type clusters of different types in a descending order, and identify the grid event type with high occurrence.
Compared with the prior art, the invention has the following technical effects: the method clusters a large amount of scattered grid event records by means of an improved clustering algorithm, can obtain similar grid event type clusters according to grid event situation texts, and realizes accurate identification of high-occurrence grid event types through statistical analysis, so that the high-occurrence grid event types can be subjected to key monitoring and early warning, and the occupation of large data computing resources for carrying out clustering algorithm operation is reduced; the scheme realizes the accurate identification of the type of the high-occurrence grid event, so that comprehensive treatment center personnel can carry out efficient classified and graded treatment on the grid event; compared with the existing clustering algorithm, the algorithm of the scheme is innovative, and the occupation of big data computing resources for bearing the running of the clustering algorithm is reduced.
Description of the drawings:
FIG. 1 is a flow chart of a single-dimensional clustering analysis method for classified and graded treatment of grid events according to the present invention;
FIG. 2 is a flow chart of the one-dimensional clustering algorithm of the present invention.
The specific implementation mode is as follows:
for the purpose of enhancing an understanding of the present invention, the present invention will be described in detail with reference to the following examples.
Example 1: referring to fig. 1 and 2, a method for single-dimensional cluster analysis for grid event classification and ranking treatment includes the following steps:
acquiring a grid event record;
cleaning the acquired grid event record, and only keeping a grid event condition text;
extracting semantic features of a grid event situation text by using a BERT model to generate a multi-dimensional feature vector;
clustering grid event data by using a single-dimensional clustering algorithm;
generating K grid event type clusters of different types, and storing the grid event type clusters into HBASE;
and (6) identifying the type of the high-occurrence grid event.
The step (1) is specifically as follows: extracting grid event records from a multi-source database;
the step (2) is specifically as follows: and cleaning the extracted grid event record by using an ETL tool, removing fields such as an event occurrence place, an event occurrence time and the like, and only keeping a grid event situation text.
The step (3) is specifically as follows: and inputting a grid event situation text, calculating a weight value of the text by using a BERT model, and outputting a multi-dimensional semantic feature vector of the text.
The step (4) comprises the following substeps:
(41) calculating the similarity value between any one preselected grid event situation text and the rest grid event situation texts by using a formula simVal ═ cX1+ dX 2;
calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2
c is 0.9, and c is a weight parameter;
d is 0.1, and d is a weight parameter;
x1: cosine value of vector included angle;
x2: vector Euclidean distance normalization value;
(42) generating N clusters containing all grid event situation texts similar to any one preselected grid event situation text according to the similarity value calculation result;
(43) selecting N pre-selected grid event condition texts corresponding to the N grid event condition text clusters as a central text; m is less than N.
(44) Calculating a similarity value between any two of the N center texts using the formula simVal ═ cX1+ dX 2;
(45) if any two central files are similar, deleting the smaller cluster of the two clusters containing any two central texts; if any two central files are not similar, two clusters containing any two central texts are reserved;
(46) outputting M clusters generated by the initial clustering, and arranging the M clusters according to a descending order;
(47) selecting the largest cluster of the M clusters as an initial cluster, and sequentially calculating the coincidence value of the remaining arbitrary cluster classes and the initial cluster according to a descending order
(48) Manually setting the contact ratio threshold value to be 0.1, if the contact ratio threshold value is larger than 0.1, considering that the cluster class is similar to the initial cluster, deleting the cluster class, and simultaneously returning to the step D7;
(49) if the cluster is smaller than the contact ratio threshold value, the cluster class is considered to be dissimilar to the initial cluster, the cluster is reserved, and the initial cluster is changed;
repeating steps (47), (48) and (49) until all remaining cluster classes are traversed.
Specifically, the step (6) is to arrange the grid event type clusters of the K different types in a descending order and identify the grid event type with high occurrence.
It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and all equivalent modifications and substitutions based on the above-mentioned technical solutions are within the scope of the present invention as defined in the claims.

Claims (6)

1. A single-dimensional clustering analysis method for classified and hierarchical treatment of grid events is characterized by comprising the following steps:
acquiring a grid event record;
cleaning the acquired grid event record, and only keeping a grid event condition text;
extracting semantic features of a grid event situation text by using a BERT model to generate a multi-dimensional feature vector;
clustering grid event condition data by using a single-dimensional clustering algorithm;
generating K grid event type clusters of different types, and storing the grid event type clusters into HBASE;
and (6) identifying the type of the high-occurrence grid event.
2. The method for single-dimensional cluster analysis with grid event classification and ranking oriented processing according to claim 1, wherein the step (1) is specifically: grid event records are extracted from a multi-source database.
3. The method for single-dimensional cluster analysis with grid event classification and ranking oriented processing according to claim 1, wherein the step (2) is specifically: and cleaning the extracted grid event record by using an ETL tool, removing fields such as an event occurrence place, an event occurrence time and the like, and only keeping an event situation text.
4. The method for single-dimensional cluster analysis with grid event classification and ranking oriented processing according to claim 1, wherein the step (3) is specifically: and inputting a grid event situation text, calculating a weight value of the text by using a BERT model, and outputting a multi-dimensional semantic feature vector of the text.
5. The method for single-dimensional cluster analysis with hierarchical treatment for grid event oriented classification as claimed in claim 1, wherein said step (4) comprises the following sub-steps:
(41) calculating the similarity value simVal between any one preselected grid event situation text and the rest grid event situation texts; calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2
C, presetting a value range: (0.8-0.9) c is a weight parameter;
d, presetting a value range: (0.1-0.2) d is a weight parameter;
x1: cosine value of vector included angle;
x2: vector Euclidean distance normalization value;
(42) generating N clusters containing all grid event situation texts similar to any one preselected grid event situation text according to the similarity value calculation result;
(43) selecting N pre-selected grid event condition texts corresponding to the N grid event condition text clusters as a central text;
(44) calculating a similarity value simVal between any two central texts in the N central texts;
calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2
c is 0.9, and c is a weight parameter;
d is 0.1, and d is a weight parameter;
x1: cosine value of vector included angle;
x2: vector Euclidean distance normalization value;
(45) if any two central files are similar, deleting the smaller cluster of the two clusters containing any two central texts; if any two central files are not similar, two clusters containing any two central texts are reserved;
(46) outputting M clusters generated by the initial clustering;
(47) and carrying out secondary de-duplication on the output M clusters, wherein M is smaller than N.
6. The method for single-dimensional cluster analysis with classification and hierarchical processing oriented to grid events according to claim 1, wherein the step (6) is specifically to sort the grid type clusters of K different types in a descending order and identify the high-occurrence grid event type.
CN201911143455.8A 2019-11-20 2019-11-20 Single-dimensional clustering analysis method for classified and graded treatment of grid events Pending CN111723136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911143455.8A CN111723136A (en) 2019-11-20 2019-11-20 Single-dimensional clustering analysis method for classified and graded treatment of grid events

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911143455.8A CN111723136A (en) 2019-11-20 2019-11-20 Single-dimensional clustering analysis method for classified and graded treatment of grid events

Publications (1)

Publication Number Publication Date
CN111723136A true CN111723136A (en) 2020-09-29

Family

ID=72563921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911143455.8A Pending CN111723136A (en) 2019-11-20 2019-11-20 Single-dimensional clustering analysis method for classified and graded treatment of grid events

Country Status (1)

Country Link
CN (1) CN111723136A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559844A (en) * 2020-12-17 2021-03-26 北京邮电大学 Natural disaster public opinion analysis method and device
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239436A (en) * 2014-08-27 2014-12-24 南京邮电大学 Network hot event detection method based on text classification and clustering analysis
CN108932311A (en) * 2018-06-20 2018-12-04 天津大学 The method of incident detection and prediction
CN109508379A (en) * 2018-12-21 2019-03-22 上海文军信息技术有限公司 A kind of short text clustering method indicating and combine similarity based on weighted words vector
CN110263169A (en) * 2019-03-27 2019-09-20 青岛大学 A kind of focus incident detection method based on convolutional neural networks and keyword clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239436A (en) * 2014-08-27 2014-12-24 南京邮电大学 Network hot event detection method based on text classification and clustering analysis
CN108932311A (en) * 2018-06-20 2018-12-04 天津大学 The method of incident detection and prediction
CN109508379A (en) * 2018-12-21 2019-03-22 上海文军信息技术有限公司 A kind of short text clustering method indicating and combine similarity based on weighted words vector
CN110263169A (en) * 2019-03-27 2019-09-20 青岛大学 A kind of focus incident detection method based on convolutional neural networks and keyword clustering

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559844A (en) * 2020-12-17 2021-03-26 北京邮电大学 Natural disaster public opinion analysis method and device
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN115168345B (en) * 2022-06-27 2023-04-18 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN107391353B (en) Method for detecting abnormal behavior of complex software system based on log
WO2016029570A1 (en) Intelligent alert analysis method for power grid scheduling
CN108062484A (en) A kind of classification stage division based on data sensitive feature and database metadata
CN104156403B (en) A kind of big data normal mode extracting method and system based on cluster
CN114048870A (en) Power system abnormity monitoring method based on log characteristic intelligent mining
CN112434962B (en) Enterprise user state evaluation method and system based on power load data
CN106021545A (en) Method for remote diagnoses of cars and retrieval of spare parts
CN105376193A (en) Intelligent association analysis method and intelligent association analysis device for security events
CN112685459A (en) Attack source feature identification method based on K-means clustering algorithm
CN114912787A (en) Intelligent assessment method for enterprise dangerous waste concealing, reporting and missing reporting risks
CN111723136A (en) Single-dimensional clustering analysis method for classified and graded treatment of grid events
CN115641162A (en) Prediction data analysis system and method based on construction project cost
CN111538741A (en) Deep learning analysis method and system for big data of alarm condition
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
CN112288317B (en) Industrial big data analysis platform and method based on multi-source heterogeneous data governance
CN116777124B (en) Power stealing monitoring method based on user power consumption behavior
US20160078071A1 (en) Large scale offline retrieval of machine operational information
CN103268329A (en) Plasma display screen manufacturing process data mining system
CN112967759B (en) DNA material evidence identification STR typing comparison method based on memory stack technology
CN110866624A (en) Chemical accident prediction method and system
Marathe LRZ convolution: an algorithm for automatic anomaly detection in time-series data
CN112084332A (en) Violation classification method based on deep bidirectional language text processing network
Aliyudin et al. APPLICATION OF THE C5. 0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS
Feng et al. A new rough set based Bayesian classifier prior assumption
Ren et al. An approach for predicting hype cycle based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination